Conversion routines to the UTF32 format. More...

Inheritance diagram for Burger::UTF32:

Collaboration diagram for Burger::UTF32:

Static Public Member Functions
static uint_t	IsValid (uint32_t uInput) noexcept
	Validate a UTF32 value.

static uint_t	IsValid (const uint32_t *pInput) noexcept
	Check a UTF32 "C" string for validity.

static uint_t	IsValid (const uint32_t *pInput, uintptr_t uElementCount) noexcept
	Check a UTF32 uint32_t array for validity.

static uint32_t	translate_from_UTF8 (const char *pInput) noexcept
	Return a UTF32 code from a UTF8 stream.

static uint32_t	translate_from_UTF8 (const char **ppInput) noexcept
	Return a UTF32 code from a UTF8 stream and update the pointer.

static uintptr_t	translate_from_UTF8 (uint32_t pOutput, uintptr_t uOutputSize, const char pInput) noexcept
	Convert a UTF8 "C" string into a UTF32 stream.

static uintptr_t	translate_from_UTF8 (uint32_t pOutput, uintptr_t uOutputSize, const char pInput, uintptr_t uInputSize) noexcept
	Convert a UTF8 stream into a UTF32 uint32_t array.

Static Public Attributes
static const uint32_t	kEndianMark = 0x0000FEFFU
	Byte stream token for native endian.

static const uint32_t	kBigEndianMark = 0x0000FEFFU
	32 bit Byte Order Mark (BOM) for Big Endian UTF32 data.

static const uint32_t	kLittleEndianMark = 0xFFFE0000U
	32 bit Byte Order Mark (BOM) for Little Endian UTF32 data.

Static Public Attributes inherited from Burger::CodePage
static const uint32_t	kInvalid = UINT32_MAX
	Value returned if a routine failed.

Detailed Description

Conversion routines to the UTF32 format.

UTF32 is simplest data format for Unicode data to be stored. It is a 32 bit wide "C" string. It can easily contain all of the characters for the worlds' languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF32 which some foreign APIs require for internationalization or functions that need UTF32 strings for simplified internal code.

Note: These functions operate on strings that are native endian.

Member Function Documentation

◆ IsValid() [1/3]

uint_t BURGER_API Burger::UTF32::IsValid ( const uint32_t * pInput )

staticnoexcept

Check a UTF32 "C" string for validity.

Check a "C" string if it's a valid UTF32 stream. Return false if there was an error, or true if the bytes represent a valid UTF32 pattern. Parsing will end once a zero character is found.

Parameters

pInput Pointer to a zero terminated string, nullptr will page fault.

Returns: TRUE if the entire string is a valid UTF32 stream, FALSE if not.

See also: IsValid(uint32_t) or IsValid(const uint32_t *, uintptr_t)

◆ IsValid() [2/3]

uint_t BURGER_API Burger::UTF32::IsValid	(	const uint32_t *	pInput,
		uintptr_t	uElementCount )

staticnoexcept

Check a UTF32 uint32_t array for validity.

Check a uint32_t array and see if it's a valid UTF32 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF32 pattern. Zeros are considered valid in the stream.

Parameters

pInput	Pointer to UTF32 data. Can be `nullptr` if uInputSize is zero, otherwise page fault.
uElementCount	Size of the data in elements, if zero, then the function will return TRUE and perform no work.

Returns: TRUE if the entire string is a valid UTF8 stream, FALSE if not.

See also: IsValid(uint32_t) or IsValid(const uint32_t *)

◆ IsValid() [3/3]

uint_t BURGER_API Burger::UTF32::IsValid ( uint32_t uInput )

staticnoexcept

Validate a UTF32 value.

Return TRUE if a UTF32 character is in the valid bounds. (0-0xD7FF) or (0xE000-0x10FFFF).

Parameters

uInput UTF32 encoded character value.

Returns: TRUE if in bounds, FALSE if not.

See also: IsValid(const uint32_t *) or IsValid(const uint32_t *, uintptr_t)

◆ translate_from_UTF8() [1/4]

uint32_t BURGER_API BURGER_API Burger::UTF32::translate_from_UTF8 ( const char ** ppInput )

staticnoexcept

Return a UTF32 code from a UTF8 stream and update the pointer.

Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid.

Parameters

ppInput Pointer to a valid UTF8 "C" string pointer, nullptr will page fault.

Returns: The UTF32 code or kInvalid if invalid. 0x00 is not invalid.

See also: FromUTF8(const char *), UTF8::GetTokenSize(const char *) or UTF8::NextToken(const char *).

◆ translate_from_UTF8() [2/4]

uint32_t BURGER_API Burger::UTF32::translate_from_UTF8 ( const char * pInput )

staticnoexcept

Return a UTF32 code from a UTF8 stream.

Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that is invalid.

Note: This function will not move the pointer forward, use UTF8::NextToken(const char *) instead.

Parameters

pInput Pointer to a valid UTF8 "C" string, nullptr will page fault.

Returns: The UTF32 code or kInvalid if invalid. 0x00 is not invalid.

See also: translate_from_UTF8(const char **), UTF8::GetTokenSize(const char *) or UTF8::NextToken(const char *).

◆ translate_from_UTF8() [3/4]

uintptr_t BURGER_API Burger::UTF32::translate_from_UTF8	(	uint32_t *	pOutput,
		uintptr_t	uOutputSize,
		const char *	pInput )

staticnoexcept

Convert a UTF8 "C" string into a UTF32 stream.

Take a "C" string that is using UTF8 encoding and convert it to a UTF32 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be null to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note: This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.; If invalid UTF8 data is found, it will be skipped.

Parameters

pOutput	Pointer to UTF8 buffer to receive the converted string, `nullptr` is okay if uOutputSize is zero, otherwise it will page fault.
uOutputSize	Size of the output buffer in bytes.
pInput	UTF32 encoded "C" string, `nullptr` will page fault.

Returns: The number of bytes of the potential output without the trailing uint32_t zero. It is valid, even if the output buffer wasn't large enough to contain everything.

◆ translate_from_UTF8() [4/4]

uintptr_t BURGER_API Burger::UTF32::translate_from_UTF8	(	uint32_t *	pOutput,
		uintptr_t	uOutputSize,
		const char *	pInput,
		uintptr_t	uInputSize )

staticnoexcept

Convert a UTF8 stream into a UTF32 uint32_t array.

Take a byte array that is using UTF8 encoding and convert it to a UTF32 uint32_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note: This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.; Zeros can be encoded into the stream. This function will not early out if a zero was parsed. Zeros will be placed in the UTF32 stream as is.

Parameters

pOutput	Pointer to a byte buffer to receive the UTF32 string `nullptr` is okay if uOutputSize is zero, otherwise a page fault will occur.
uOutputSize	Size of the output buffer in bytes.
pInput	UTF8 encoded byte array, `nullptr` is okay if uInputSize is zero.
uInputSize	Size of the input byte array.

Returns: Byte count of the potential output. It is valid, even if the output buffer wasn't large enough to contain everything.

Member Data Documentation

◆ kBigEndianMark

Burger::UTF32::kBigEndianMark = 0x0000FEFFU

static

32 bit Byte Order Mark (BOM) for Big Endian UTF32 data.

If a token was read in the matched this constant, then it is assumed that all of the following data is BigEndian. It adheres to the Unicode standard for UTF-32 encoding.

◆ kEndianMark

Burger::UTF32::kEndianMark = 0x0000FEFFU

static

Byte stream token for native endian.

When writing a text file using UTF32, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use kBigEndianMark or kLittleEndianMark to test incoming data to determine the endian of data that is unknown.

◆ kLittleEndianMark

Burger::UTF32::kLittleEndianMark = 0xFFFE0000U

static

32 bit Byte Order Mark (BOM) for Little Endian UTF32 data.

If a token was read in the matched this constant, then it is assumed that all of the following data is LittleEndian. It adheres to the Unicode standard for UTF-32 encoding.

Static Public Member Functions

Static Public Attributes

Detailed Description

Member Function Documentation

◆ IsValid() [1/3]

◆ IsValid() [2/3]

◆ IsValid() [3/3]

◆ translate_from_UTF8() [1/4]

◆ translate_from_UTF8() [2/4]

◆ translate_from_UTF8() [3/4]

◆ translate_from_UTF8() [4/4]

Member Data Documentation

◆ kBigEndianMark

◆ kEndianMark

◆ kLittleEndianMark