Conversion routines to the UTF8 format. More...
Static Public Member Functions | |
static uint_t | is_valid (uint16_t uInput) noexcept |
Validate a UTF16 value. | |
static uint_t | is_valid (const uint16_t *pInput) noexcept |
Check a UTF16 "C" string for validity. | |
static uint_t | is_valid (const uint16_t *pInput, uintptr_t uElementCount) noexcept |
Check a UTF16 uint16_t array for validity. | |
static uint16_t | translate_from_UTF8 (const char *pInput) noexcept |
Return a UTF16 code from a UTF8 stream. | |
static uintptr_t | translate_from_UTF8 (uint16_t *pOutput, uintptr_t uOutputSize, const char *pInput) noexcept |
Convert a UTF8 "C" string into a UTF16 stream. | |
static uintptr_t | translate_from_UTF8 (uint16_t *pOutput, uintptr_t uOutputSize, const char *pInput, uintptr_t uInputSize) noexcept |
Convert a UTF8 stream into a UTF16 uint16_t array. | |
Static Public Attributes | |
static const uint16_t | kInvalid = UINT16_MAX |
Value returned if a routine failed. | |
static const uint16_t | kEndianMark = 0xFEFFU |
Byte stream token for native endian. | |
static const uint16_t | kBigEndianMark = 0xFEFFU |
16 bit Byte Order Mark (BOM) for Big Endian UTF16 data. | |
static const uint16_t | kLittleEndianMark = 0xFFFEU |
16 bit Byte Order Mark (BOM) for Little Endian UTF16 data. | |
Conversion routines to the UTF8 format.
UTF16 is a data format that allows Unicode data to be stored in a 16 bit wide "C" string. It is wide enough to contain all of the most popular characters for the worlds languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF16 which some foreign APIs require for internationalization. Please note that these functions operate on strings that are native endian.
|
staticnoexcept |
|
staticnoexcept |
Check a UTF16 uint16_t array for validity.
Check a uint16_t array and see if it's a valid UTF16 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF16 pattern.
pInput | Pointer to UTF16 data. Can be nullptr if uInputSize is zero, otherwise page fault. |
uElementCount | Size of the data in elements, if zero, then the function will return true. |
|
staticnoexcept |
Validate a UTF16 value.
Return TRUE if a UTF16 character is in the valid bounds. (0-0xD7FF) or (0xE000-0xFFFF).
uInput | UTF16 encoded character value. |
|
staticnoexcept |
Return a UTF16 code from a UTF8 stream.
Convert from a UTF8 stream into a 16 bit Unicode value (0x00 to 0FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid. It will not parse Unicode values in the range of 0xD800-0xDFFF and greater than 0xFFFF, these do not fit in a single 16 bit quantity and will return an error.
pInput | Pointer to a valid UTF8 "C" string. |
|
staticnoexcept |
Convert a UTF8 "C" string into a UTF16 stream.
Take a "C" string that is using UTF8 encoding and convert it to a UTF16 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be NULL to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.
pOutput | Pointer to UTF8 buffer to receive the converted string, nullptr is okay if uOutputSize is zero, otherwise it will page fault. |
uOutputSize | Size of the output buffer in elements. |
pInput | UTF16 encoded "C"string, nullptr will page fault. |
|
staticnoexcept |
Convert a UTF8 stream into a UTF16 uint16_t array.
Take a byte array that is using UTF8 encoding and convert it to a UTF16 uint16_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr
to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.
pOutput | Pointer to a uint16_t buffer to receive the UTF16 string, nullptr is okay if uOutputSize is zero, otherwise a page fault will occur. |
uOutputSize | Size of the output buffer in elements. |
pInput | UTF8 encoded byte array, nullptr is okay if uInputSize is zero. |
uInputSize | Size of the input byte array. |
|
static |
16 bit Byte Order Mark (BOM) for Big Endian UTF16 data.
If a token was read in the matched this constant, then you must assume that all of the following data is Big Endian. It adheres to the Unicode standard for UTF-16
|
static |
Byte stream token for native endian.
When writing a text file using UTF16, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use Burger::UTF16::BE or Burger::UTF16::LE to test incoming data to determine the endian of data that's unknown.
|
static |
|
static |
16 bit Byte Order Mark (BOM) for Little Endian UTF16 data.
If a token was read in the matched this constant, then you must assume that all of the following data is Little Endian. It adheres to the Unicode standard for UTF-16