UTF-8 Encoder & Decoder — Convert Text to UTF-8 Bytes Online

Convert text to UTF-8 byte representation in hex, decimal, binary, or percent-encoded format. Decode UTF-8 byte sequences back to readable text. See character count, byte count, and encoding details.

Text → UTF-8 Bytes

Text Input

Output format Hex (C3 A9) Decimal (195 169) Binary (11000011) Percent-encoded (%C3%A9)

UTF-8 Bytes

UTF-8 Bytes → Text

Input format Hex Decimal Percent-encoded

Byte Input

Decoded Text

How UTF-8 encoding works

UTF-8 is the dominant character encoding for the web, used by over 98% of websites. It encodes each Unicode code point into one to four bytes, making it backward-compatible with ASCII while supporting every character in the Unicode standard — including emoji, CJK characters, and mathematical symbols.

ASCII characters (U+0000 to U+007F) use a single byte, identical to their ASCII values. Characters outside this range use 2-4 bytes, with leading bits indicating the byte count. This variable-length encoding keeps English text compact while supporting all world scripts.

UTF-8 byte ranges

1 byte (0xxxxxxx): ASCII characters U+0000–U+007F (A-Z, 0-9, basic punctuation)
2 bytes (110xxxxx 10xxxxxx): Latin, Greek, Cyrillic, Arabic, Hebrew U+0080–U+07FF
3 bytes (1110xxxx 10xxxxxx 10xxxxxx): CJK, most emoji, symbols U+0800–U+FFFF
4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx): Rare characters, flags, extended emoji U+10000–U+10FFFF

Common use cases

Debug encoding issues when text appears garbled (mojibake)
Inspect byte-level representation for network protocols
Verify correct encoding in databases and file systems

FAQs

What is the difference between UTF-8 and Unicode?

Unicode is a character set that assigns a unique number (code point) to every character. UTF-8 is an encoding that defines how those code points are stored as bytes. Unicode defines what characters exist; UTF-8 defines how to represent them in binary.

Why do some characters use more bytes than others in UTF-8?

UTF-8 uses variable-length encoding for efficiency. ASCII characters (the most common in English) use just 1 byte, keeping text compact. Less common characters use 2-4 bytes. This design makes UTF-8 backward-compatible with ASCII while supporting all Unicode characters.

How can I tell if text is UTF-8 encoded?

Look at the byte patterns: UTF-8 multi-byte sequences always start with specific bit patterns (110, 1110, or 11110) followed by continuation bytes starting with 10. If the bytes follow these patterns, the text is likely UTF-8. Invalid sequences indicate a different encoding.

What causes garbled text (mojibake) and how do I fix it?

Mojibake occurs when text encoded in one format (e.g., UTF-8) is decoded using a different format (e.g., Latin-1). To fix it, identify the original encoding by examining the byte sequence, then decode with the correct encoding. This tool helps you inspect bytes to diagnose encoding issues.

UTF-8 Encoder & Decoder — Convert Text to UTF-8 Bytes Online

Text → UTF-8 Bytes

UTF-8 Bytes → Text

How UTF-8 encoding works

UTF-8 byte ranges

Common use cases

FAQs

What is the difference between UTF-8 and Unicode?

Why do some characters use more bytes than others in UTF-8?

How can I tell if text is UTF-8 encoded?

What causes garbled text (mojibake) and how do I fix it?

Herramientas relacionadas

Codificador y decodificador Base64 en línea

Codificador y decodificador de texto

Imagen a Base64

Base64 a imagen

Archivo a Base64

Base64 to File

Base64 a PDF

Decodificador JWT

Codificador/Decodificador URL

Codificador/Decodificador HTML

Convertidor Hex ↔ Base64

Base64 to Text

Base64 Validator

Base64 to JSON

Data URI Generator

Base64 to Hex

SVG to Data URI

Artículos relacionados