Archiv für Juli, 2007

UTF-8 Files and the Preamble

The UTF-8 preamble (), also known as the UTF-8 BOM or signature, is a 3 byte sequence at the start of a file indicating it is UTF-8. Like the UTF-16 BOM, this is not particular to XML, it is for any text file. But unlike the UTF-16 BOM, Byte Order Mark is not a correct term in this case because in UTF-8 there is no byte order. In hex, the UTF-8 preamble is ef bb bf.

While the UTF-16 BOM is standard, the UTF-8 preamble is not widely accepted and it is discouraged on UNIX operating systems. Microsoft Notepad uses the UTF-8 preamble when it saves UTF-8 documents, but does not need it to recognize UTF-8 encoding when it loads files. The 3 byte UTF-8 preamble is not recommended in XML files because if the file begins with an ASCII less than sign, it is already assumed to be UTF-8 unless the XML Declaration specifies another encoding.

Byte-Order Mark found in UTF-8 File.

The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause problems for some text editors and older browsers. You may want to consider avoiding its use until it is better supported.

KategorienMiscellaneous Tags: