Kicking it Olde Sküül! Burgerlib on Github Follow Olde Sküül on Twitter Burgerbecky on LinkedIn Burgerbecky on LinkedIn
Loading...
Searching...
No Matches
Burger::UTF32 Class Reference

Conversion routines to the UTF32 format. More...

Inheritance diagram for Burger::UTF32:
Collaboration diagram for Burger::UTF32:

Static Public Member Functions

static uint_t is_valid (uint32_t uInput) noexcept
 Validate a UTF32 value.
 
static uint_t is_valid (const uint32_t *pInput) noexcept
 Check a UTF32 "C" string for validity.
 
static uint_t is_valid (const uint32_t *pInput, uintptr_t uElementCount) noexcept
 Check a UTF32 uint32_t array for validity.
 
static uint32_t translate_from_UTF8 (const char *pInput) noexcept
 Return a UTF32 code from a UTF8 stream.
 
static uint32_t translate_from_UTF8 (const char **ppInput) noexcept
 Return a UTF32 code from a UTF8 stream and update the pointer.
 
static uintptr_t translate_from_UTF8 (uint32_t *pOutput, uintptr_t uOutputSize, const char *pInput) noexcept
 Convert a UTF8 "C" string into a UTF32 stream.
 
static uintptr_t translate_from_UTF8 (uint32_t *pOutput, uintptr_t uOutputSize, const char *pInput, uintptr_t uInputSize) noexcept
 Convert a UTF8 stream into a UTF32 uint32_t array.
 

Static Public Attributes

static const uint32_t kEndianMark = 0x0000FEFFU
 Byte stream token for native endian.
 
static const uint32_t kBigEndianMark = 0x0000FEFFU
 32 bit Byte Order Mark (BOM) for Big Endian UTF32 data.
 
static const uint32_t kLittleEndianMark = 0xFFFE0000U
 32 bit Byte Order Mark (BOM) for Little Endian UTF32 data.
 
- Static Public Attributes inherited from Burger::CodePage
static const uint32_t kInvalid = UINT32_MAX
 Value returned if a routine failed.
 

Detailed Description

Conversion routines to the UTF32 format.


UTF32 is simplest data format for Unicode data to be stored. It is a 32 bit wide "C" string. It can easily contain all of the characters for the worlds' languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF32 which some foreign APIs require for internationalization or functions that need UTF32 strings for simplified internal code.

Note
These functions operate on strings that are native endian.

Member Function Documentation

◆ is_valid() [1/3]

uint_t BURGER_API Burger::UTF32::is_valid ( const uint32_t * pInput)
staticnoexcept

Check a UTF32 "C" string for validity.


Check a "C" string if it's a valid UTF32 stream. Return false if there was an error, or true if the bytes represent a valid UTF32 pattern. Parsing will end once a zero character is found.

Parameters
pInputPointer to a zero terminated string, nullptr will page fault.
Returns
TRUE if the entire string is a valid UTF32 stream, FALSE if not.
See also
is_valid(uint32_t) or is_valid(const uint32_t *, uintptr_t)

◆ is_valid() [2/3]

uint_t BURGER_API Burger::UTF32::is_valid ( const uint32_t * pInput,
uintptr_t uElementCount )
staticnoexcept

Check a UTF32 uint32_t array for validity.


Check a uint32_t array and see if it's a valid UTF32 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF32 pattern. Zeros are considered valid in the stream.

Parameters
pInputPointer to UTF32 data. Can be nullptr if uInputSize is zero, otherwise page fault.
uElementCountSize of the data in elements, if zero, then the function will return TRUE and perform no work.
Returns
TRUE if the entire string is a valid UTF8 stream, FALSE if not.
See also
is_valid(uint32_t) or is_valid(const uint32_t *)

◆ is_valid() [3/3]

uint_t BURGER_API Burger::UTF32::is_valid ( uint32_t uInput)
staticnoexcept

Validate a UTF32 value.


Return TRUE if a UTF32 character is in the valid bounds. (0-0xD7FF) or (0xE000-0x10FFFF).

Parameters
uInputUTF32 encoded character value.
Returns
TRUE if in bounds, FALSE if not.
See also
is_valid(const uint32_t *) or is_valid(const uint32_t *, uintptr_t)

◆ translate_from_UTF8() [1/4]

uint32_t BURGER_API BURGER_API Burger::UTF32::translate_from_UTF8 ( const char ** ppInput)
staticnoexcept

Return a UTF32 code from a UTF8 stream and update the pointer.


Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid.

Parameters
ppInputPointer to a valid UTF8 "C" string pointer, nullptr will page fault.
Returns
The UTF32 code or kInvalid if invalid. 0x00 is not invalid.
See also
FromUTF8(const char *), UTF8::GetTokenSize(const char *) or UTF8::NextToken(const char *).

◆ translate_from_UTF8() [2/4]

uint32_t BURGER_API Burger::UTF32::translate_from_UTF8 ( const char * pInput)
staticnoexcept

Return a UTF32 code from a UTF8 stream.


Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that is invalid.

Note
This function will not move the pointer forward, use UTF8::NextToken(const char *) instead.
Parameters
pInputPointer to a valid UTF8 "C" string, nullptr will page fault.
Returns
The UTF32 code or kInvalid if invalid. 0x00 is not invalid.
See also
translate_from_UTF8(const char **), UTF8::GetTokenSize(const char *) or UTF8::NextToken(const char *).

◆ translate_from_UTF8() [3/4]

uintptr_t BURGER_API Burger::UTF32::translate_from_UTF8 ( uint32_t * pOutput,
uintptr_t uOutputSize,
const char * pInput )
staticnoexcept

Convert a UTF8 "C" string into a UTF32 stream.


Take a "C" string that is using UTF8 encoding and convert it to a UTF32 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be null to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
If invalid UTF8 data is found, it will be skipped.
Parameters
pOutputPointer to UTF8 buffer to receive the converted string, nullptr is okay if uOutputSize is zero, otherwise it will page fault.
uOutputSizeSize of the output buffer in bytes.
pInputUTF32 encoded "C" string, nullptr will page fault.
Returns
The number of bytes of the potential output without the trailing uint32_t zero. It is valid, even if the output buffer wasn't large enough to contain everything.

◆ translate_from_UTF8() [4/4]

uintptr_t BURGER_API Burger::UTF32::translate_from_UTF8 ( uint32_t * pOutput,
uintptr_t uOutputSize,
const char * pInput,
uintptr_t uInputSize )
staticnoexcept

Convert a UTF8 stream into a UTF32 uint32_t array.


Take a byte array that is using UTF8 encoding and convert it to a UTF32 uint32_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
Zeros can be encoded into the stream. This function will not early out if a zero was parsed. Zeros will be placed in the UTF32 stream as is.
Parameters
pOutputPointer to a byte buffer to receive the UTF32 string nullptr is okay if uOutputSize is zero, otherwise a page fault will occur.
uOutputSizeSize of the output buffer in bytes.
pInputUTF8 encoded byte array, nullptr is okay if uInputSize is zero.
uInputSizeSize of the input byte array.
Returns
Byte count of the potential output. It is valid, even if the output buffer wasn't large enough to contain everything.

Member Data Documentation

◆ kBigEndianMark

const uint32_t Burger::UTF32::kBigEndianMark = 0x0000FEFFU
static

32 bit Byte Order Mark (BOM) for Big Endian UTF32 data.


If a token was read in the matched this constant, then it is assumed that all of the following data is BigEndian. It adheres to the Unicode standard for UTF-32 encoding.

◆ kEndianMark

const uint32_t Burger::UTF32::kEndianMark = 0x0000FEFFU
static

Byte stream token for native endian.


When writing a text file using UTF32, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use kBigEndianMark or kLittleEndianMark to test incoming data to determine the endian of data that is unknown.

◆ kLittleEndianMark

const uint32_t Burger::UTF32::kLittleEndianMark = 0xFFFE0000U
static

32 bit Byte Order Mark (BOM) for Little Endian UTF32 data.


If a token was read in the matched this constant, then it is assumed that all of the following data is LittleEndian. It adheres to the Unicode standard for UTF-32 encoding.