Conversion routines to the UTF8 format. More...

Static Public Member Functions
static uint_t	IsValid (uint16_t uInput) noexcept
	Validate a UTF16 value.

static uint_t	IsValid (const uint16_t *pInput) noexcept
	Check a UTF16 "C" string for validity.

static uint_t	IsValid (const uint16_t *pInput, uintptr_t uElementCount) noexcept
	Check a UTF16 uint16_t array for validity.

static uint16_t	translate_from_UTF8 (const char *pInput) noexcept
	Return a UTF16 code from a UTF8 stream.

static uintptr_t	translate_from_UTF8 (uint16_t pOutput, uintptr_t uOutputSize, const char pInput) noexcept
	Convert a UTF8 "C" string into a UTF16 stream.

static uintptr_t	translate_from_UTF8 (uint16_t pOutput, uintptr_t uOutputSize, const char pInput, uintptr_t uInputSize) noexcept
	Convert a UTF8 stream into a UTF16 uint16_t array.

Static Public Attributes
static const uint16_t	kInvalid = UINT16_MAX
	Value returned if a routine failed.

static const uint16_t	kEndianMark = 0xFEFFU
	Byte stream token for native endian.

static const uint16_t	kBigEndianMark = 0xFEFFU
	16 bit Byte Order Mark (BOM) for Big Endian UTF16 data.

static const uint16_t	kLittleEndianMark = 0xFFFEU
	16 bit Byte Order Mark (BOM) for Little Endian UTF16 data.

Detailed Description

Conversion routines to the UTF8 format.

UTF16 is a data format that allows Unicode data to be stored in a 16 bit wide "C" string. It is wide enough to contain all of the most popular characters for the worlds languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF16 which some foreign APIs require for internationalization. Please note that these functions operate on strings that are native endian.

Member Function Documentation

◆ IsValid() [1/3]

uint_t BURGER_API Burger::UTF16::IsValid ( const uint16_t * pInput )

staticnoexcept

Check a UTF16 "C" string for validity.

Check a "C" string if it's a valid UTF16 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF16 pattern.

Parameters

pInput Pointer to a zero terminated string, nullptr will page fault.

Returns: TRUE if the entire string is a valid UTF16 stream, FALSE if not.

◆ IsValid() [2/3]

uint_t BURGER_API Burger::UTF16::IsValid	(	const uint16_t *	pInput,
		uintptr_t	uElementCount )

staticnoexcept

Check a UTF16 uint16_t array for validity.

Check a uint16_t array and see if it's a valid UTF16 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF16 pattern.

Parameters

pInput	Pointer to UTF16 data. Can be `nullptr` if uInputSize is zero, otherwise page fault.
uElementCount	Size of the data in elements, if zero, then the function will return true.

Returns: TRUE if the entire string is a valid UTF8 stream, FALSE if not.

◆ IsValid() [3/3]

uint_t BURGER_API Burger::UTF16::IsValid ( uint16_t uInput )

staticnoexcept

Validate a UTF16 value.

Note: Use of this function is not recommended because it considers escape values as invalid. Use IsValid(const uint16_t*) instead.

Return TRUE if a UTF16 character is in the valid bounds. (0-0xD7FF) or (0xE000-0xFFFF).

Parameters

uInput UTF16 encoded character value.

Returns: TRUE if in bounds, FALSE if not.

◆ translate_from_UTF8() [1/3]

uint16_t BURGER_API Burger::UTF16::translate_from_UTF8 ( const char * pInput )

staticnoexcept

Return a UTF16 code from a UTF8 stream.

Convert from a UTF8 stream into a 16 bit Unicode value (0x00 to 0FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid. It will not parse Unicode values in the range of 0xD800-0xDFFF and greater than 0xFFFF, these do not fit in a single 16 bit quantity and will return an error.

Note: This function will not move the pointer forward, use Burger::UTF8::NextToken(const char *) instead.

Parameters

pInput Pointer to a valid UTF8 "C" string.

Returns: The UTF16 code or Burger::UTF16::kInvalid if invalid. 0x00 is not invalid.

See also: Burger::UTF8::GetTokenSize(const char *) or Burger::UTF8::NextToken(const char *).

◆ translate_from_UTF8() [2/3]

uintptr_t BURGER_API Burger::UTF16::translate_from_UTF8	(	uint16_t *	pOutput,
		uintptr_t	uOutputSize,
		const char *	pInput )

staticnoexcept

Convert a UTF8 "C" string into a UTF16 stream.

Take a "C" string that is using UTF8 encoding and convert it to a UTF16 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be NULL to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note: This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.; If invalid UTF8 data is found, it will be skipped.

Parameters

pOutput	Pointer to UTF8 buffer to receive the converted string, `nullptr` is okay if uOutputSize is zero, otherwise it will page fault.
uOutputSize	Size of the output buffer in elements.
pInput	UTF16 encoded "C"string, `nullptr` will page fault.

Returns: The number of elements of the potential output without the trailing uint16_t zero. It is valid, even if the output buffer wasn't large enough to contain everything.

◆ translate_from_UTF8() [3/3]

uintptr_t BURGER_API Burger::UTF16::translate_from_UTF8	(	uint16_t *	pOutput,
		uintptr_t	uOutputSize,
		const char *	pInput,
		uintptr_t	uInputSize )

staticnoexcept

Convert a UTF8 stream into a UTF16 uint16_t array.

Take a byte array that is using UTF8 encoding and convert it to a UTF16 uint16_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note: This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.; Zeros can be encoded into the stream. This function will not early out if a zero was parsed. Zeros will be placed in the UTF16 stream as is.

Parameters

pOutput	Pointer to a uint16_t buffer to receive the UTF16 string, `nullptr` is okay if uOutputSize is zero, otherwise a page fault will occur.
uOutputSize	Size of the output buffer in elements.
pInput	UTF8 encoded byte array, `nullptr` is okay if uInputSize is zero.
uInputSize	Size of the input byte array.

Returns: Byte count of the potential output. It is valid, even if the output buffer wasn't large enough to contain everything.

Member Data Documentation

◆ kBigEndianMark

Burger::UTF16::kBigEndianMark = 0xFEFFU

static

16 bit Byte Order Mark (BOM) for Big Endian UTF16 data.

If a token was read in the matched this constant, then you must assume that all of the following data is Big Endian. It adheres to the Unicode standard for UTF-16

◆ kEndianMark

Burger::UTF16::kEndianMark = 0xFEFFU

static

Byte stream token for native endian.

When writing a text file using UTF16, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use Burger::UTF16::BE or Burger::UTF16::LE to test incoming data to determine the endian of data that's unknown.

◆ kInvalid

Burger::UTF16::kInvalid = UINT16_MAX

static

Value returned if a routine failed.

If a function doesn't return TRUE or FALSE for failure, it will return this value instead. Please see the documentation for each function to know which ones use true/false pairs or this value.

◆ kLittleEndianMark

Burger::UTF16::kLittleEndianMark = 0xFFFEU

static

16 bit Byte Order Mark (BOM) for Little Endian UTF16 data.

If a token was read in the matched this constant, then you must assume that all of the following data is Little Endian. It adheres to the Unicode standard for UTF-16

Static Public Member Functions

Static Public Attributes

Detailed Description

Member Function Documentation

◆ IsValid() [1/3]

◆ IsValid() [2/3]

◆ IsValid() [3/3]

◆ translate_from_UTF8() [1/3]

◆ translate_from_UTF8() [2/3]

◆ translate_from_UTF8() [3/3]

Member Data Documentation

◆ kBigEndianMark

◆ kEndianMark

◆ kInvalid

◆ kLittleEndianMark