Conversion routines to the UTF32 format. More...
Static Public Member Functions | |
static uint_t | is_valid (uint32_t uInput) noexcept |
Validate a UTF32 value. | |
static uint_t | is_valid (const uint32_t *pInput) noexcept |
Check a UTF32 "C" string for validity. | |
static uint_t | is_valid (const uint32_t *pInput, uintptr_t uElementCount) noexcept |
Check a UTF32 uint32_t array for validity. | |
static uint32_t | translate_from_UTF8 (const char *pInput) noexcept |
Return a UTF32 code from a UTF8 stream. | |
static uint32_t | translate_from_UTF8 (const char **ppInput) noexcept |
Return a UTF32 code from a UTF8 stream and update the pointer. | |
static uintptr_t | translate_from_UTF8 (uint32_t *pOutput, uintptr_t uOutputSize, const char *pInput) noexcept |
Convert a UTF8 "C" string into a UTF32 stream. | |
static uintptr_t | translate_from_UTF8 (uint32_t *pOutput, uintptr_t uOutputSize, const char *pInput, uintptr_t uInputSize) noexcept |
Convert a UTF8 stream into a UTF32 uint32_t array. | |
Static Public Attributes | |
static const uint32_t | kEndianMark = 0x0000FEFFU |
Byte stream token for native endian. | |
static const uint32_t | kBigEndianMark = 0x0000FEFFU |
32 bit Byte Order Mark (BOM) for Big Endian UTF32 data. | |
static const uint32_t | kLittleEndianMark = 0xFFFE0000U |
32 bit Byte Order Mark (BOM) for Little Endian UTF32 data. | |
Static Public Attributes inherited from Burger::CodePage | |
static const uint32_t | kInvalid = UINT32_MAX |
Value returned if a routine failed. | |
Conversion routines to the UTF32 format.
UTF32 is simplest data format for Unicode data to be stored. It is a 32 bit wide "C" string. It can easily contain all of the characters for the worlds' languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF32 which some foreign APIs require for internationalization or functions that need UTF32 strings for simplified internal code.
|
staticnoexcept |
Check a UTF32 "C" string for validity.
Check a "C" string if it's a valid UTF32 stream. Return false if there was an error, or true if the bytes represent a valid UTF32 pattern. Parsing will end once a zero character is found.
pInput | Pointer to a zero terminated string, nullptr will page fault. |
|
staticnoexcept |
Check a UTF32 uint32_t array for validity.
Check a uint32_t array and see if it's a valid UTF32 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF32 pattern. Zeros are considered valid in the stream.
pInput | Pointer to UTF32 data. Can be nullptr if uInputSize is zero, otherwise page fault. |
uElementCount | Size of the data in elements, if zero, then the function will return TRUE and perform no work. |
|
staticnoexcept |
|
staticnoexcept |
Return a UTF32 code from a UTF8 stream and update the pointer.
Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid.
ppInput | Pointer to a valid UTF8 "C" string pointer, nullptr will page fault. |
|
staticnoexcept |
Return a UTF32 code from a UTF8 stream.
Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that is invalid.
pInput | Pointer to a valid UTF8 "C" string, nullptr will page fault. |
|
staticnoexcept |
Convert a UTF8 "C" string into a UTF32 stream.
Take a "C" string that is using UTF8 encoding and convert it to a UTF32 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be null to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.
pOutput | Pointer to UTF8 buffer to receive the converted string, nullptr is okay if uOutputSize is zero, otherwise it will page fault. |
uOutputSize | Size of the output buffer in bytes. |
pInput | UTF32 encoded "C" string, nullptr will page fault. |
|
staticnoexcept |
Convert a UTF8 stream into a UTF32 uint32_t array.
Take a byte array that is using UTF8 encoding and convert it to a UTF32 uint32_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr
to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.
pOutput | Pointer to a byte buffer to receive the UTF32 string nullptr is okay if uOutputSize is zero, otherwise a page fault will occur. |
uOutputSize | Size of the output buffer in bytes. |
pInput | UTF8 encoded byte array, nullptr is okay if uInputSize is zero. |
uInputSize | Size of the input byte array. |
|
static |
|
static |
Byte stream token for native endian.
When writing a text file using UTF32, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use kBigEndianMark or kLittleEndianMark to test incoming data to determine the endian of data that is unknown.
|
static |
32 bit Byte Order Mark (BOM) for Little Endian UTF32 data.
If a token was read in the matched this constant, then it is assumed that all of the following data is LittleEndian. It adheres to the Unicode standard for UTF-32 encoding.