Kicking it Olde Sküül! Burgerlib on Github Follow Olde Sküül on Twitter Burgerbecky on LinkedIn Burgerbecky on LinkedIn
Loading...
Searching...
No Matches
Burger::UTF16 Class Reference

Conversion routines to the UTF8 format. More...

Static Public Member Functions

static uint_t is_valid (uint16_t uInput) noexcept
 Validate a UTF16 value.
 
static uint_t is_valid (const uint16_t *pInput) noexcept
 Check a UTF16 "C" string for validity.
 
static uint_t is_valid (const uint16_t *pInput, uintptr_t uElementCount) noexcept
 Check a UTF16 uint16_t array for validity.
 
static uint16_t translate_from_UTF8 (const char *pInput) noexcept
 Return a UTF16 code from a UTF8 stream.
 
static uintptr_t translate_from_UTF8 (uint16_t *pOutput, uintptr_t uOutputSize, const char *pInput) noexcept
 Convert a UTF8 "C" string into a UTF16 stream.
 
static uintptr_t translate_from_UTF8 (uint16_t *pOutput, uintptr_t uOutputSize, const char *pInput, uintptr_t uInputSize) noexcept
 Convert a UTF8 stream into a UTF16 uint16_t array.
 

Static Public Attributes

static const uint16_t kInvalid = UINT16_MAX
 Value returned if a routine failed.
 
static const uint16_t kEndianMark = 0xFEFFU
 Byte stream token for native endian.
 
static const uint16_t kBigEndianMark = 0xFEFFU
 16 bit Byte Order Mark (BOM) for Big Endian UTF16 data.
 
static const uint16_t kLittleEndianMark = 0xFFFEU
 16 bit Byte Order Mark (BOM) for Little Endian UTF16 data.
 

Detailed Description

Conversion routines to the UTF8 format.


UTF16 is a data format that allows Unicode data to be stored in a 16 bit wide "C" string. It is wide enough to contain all of the most popular characters for the worlds languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF16 which some foreign APIs require for internationalization. Please note that these functions operate on strings that are native endian.

Member Function Documentation

◆ is_valid() [1/3]

uint_t BURGER_API Burger::UTF16::is_valid ( const uint16_t * pInput)
staticnoexcept

Check a UTF16 "C" string for validity.


Check a "C" string if it's a valid UTF16 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF16 pattern.

Parameters
pInputPointer to a zero terminated string, nullptr will page fault.
Returns
TRUE if the entire string is a valid UTF16 stream, FALSE if not.

◆ is_valid() [2/3]

uint_t BURGER_API Burger::UTF16::is_valid ( const uint16_t * pInput,
uintptr_t uElementCount )
staticnoexcept

Check a UTF16 uint16_t array for validity.


Check a uint16_t array and see if it's a valid UTF16 stream. Return FALSE if there was an error, or TRUE if the bytes represent a valid UTF16 pattern.

Parameters
pInputPointer to UTF16 data. Can be nullptr if uInputSize is zero, otherwise page fault.
uElementCountSize of the data in elements, if zero, then the function will return true.
Returns
TRUE if the entire string is a valid UTF8 stream, FALSE if not.

◆ is_valid() [3/3]

uint_t BURGER_API Burger::UTF16::is_valid ( uint16_t uInput)
staticnoexcept

Validate a UTF16 value.


Note
Use of this function is not recommended because it considers escape values as invalid. Use is_valid(const uint16_t*) instead.

Return TRUE if a UTF16 character is in the valid bounds. (0-0xD7FF) or (0xE000-0xFFFF).

Parameters
uInputUTF16 encoded character value.
Returns
TRUE if in bounds, FALSE if not.

◆ translate_from_UTF8() [1/3]

uint16_t BURGER_API Burger::UTF16::translate_from_UTF8 ( const char * pInput)
staticnoexcept

Return a UTF16 code from a UTF8 stream.


Convert from a UTF8 stream into a 16 bit Unicode value (0x00 to 0FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid. It will not parse Unicode values in the range of 0xD800-0xDFFF and greater than 0xFFFF, these do not fit in a single 16 bit quantity and will return an error.

Note
This function will not move the pointer forward, use Burger::UTF8::NextToken(const char *) instead.
Parameters
pInputPointer to a valid UTF8 "C" string.
Returns
The UTF16 code or Burger::UTF16::kInvalid if invalid. 0x00 is not invalid.
See also
Burger::UTF8::GetTokenSize(const char *) or Burger::UTF8::NextToken(const char *).

◆ translate_from_UTF8() [2/3]

uintptr_t BURGER_API Burger::UTF16::translate_from_UTF8 ( uint16_t * pOutput,
uintptr_t uOutputSize,
const char * pInput )
staticnoexcept

Convert a UTF8 "C" string into a UTF16 stream.


Take a "C" string that is using UTF8 encoding and convert it to a UTF16 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be NULL to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
If invalid UTF8 data is found, it will be skipped.
Parameters
pOutputPointer to UTF8 buffer to receive the converted string, nullptr is okay if uOutputSize is zero, otherwise it will page fault.
uOutputSizeSize of the output buffer in elements.
pInputUTF16 encoded "C"string, nullptr will page fault.
Returns
The number of elements of the potential output without the trailing uint16_t zero. It is valid, even if the output buffer wasn't large enough to contain everything.

◆ translate_from_UTF8() [3/3]

uintptr_t BURGER_API Burger::UTF16::translate_from_UTF8 ( uint16_t * pOutput,
uintptr_t uOutputSize,
const char * pInput,
uintptr_t uInputSize )
staticnoexcept

Convert a UTF8 stream into a UTF16 uint16_t array.


Take a byte array that is using UTF8 encoding and convert it to a UTF16 uint16_t encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be nullptr to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
Zeros can be encoded into the stream. This function will not early out if a zero was parsed. Zeros will be placed in the UTF16 stream as is.
Parameters
pOutputPointer to a uint16_t buffer to receive the UTF16 string, nullptr is okay if uOutputSize is zero, otherwise a page fault will occur.
uOutputSizeSize of the output buffer in elements.
pInputUTF8 encoded byte array, nullptr is okay if uInputSize is zero.
uInputSizeSize of the input byte array.
Returns
Byte count of the potential output. It is valid, even if the output buffer wasn't large enough to contain everything.

Member Data Documentation

◆ kBigEndianMark

const uint16_t Burger::UTF16::kBigEndianMark = 0xFEFFU
static

16 bit Byte Order Mark (BOM) for Big Endian UTF16 data.


If a token was read in the matched this constant, then you must assume that all of the following data is Big Endian. It adheres to the Unicode standard for UTF-16

◆ kEndianMark

const uint16_t Burger::UTF16::kEndianMark = 0xFEFFU
static

Byte stream token for native endian.


When writing a text file using UTF16, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use Burger::UTF16::BE or Burger::UTF16::LE to test incoming data to determine the endian of data that's unknown.

◆ kInvalid

const uint16_t Burger::UTF16::kInvalid = UINT16_MAX
static

Value returned if a routine failed.


If a function doesn't return TRUE or FALSE for failure, it will return this value instead. Please see the documentation for each function to know which ones use true/false pairs or this value.

◆ kLittleEndianMark

const uint16_t Burger::UTF16::kLittleEndianMark = 0xFFFEU
static

16 bit Byte Order Mark (BOM) for Little Endian UTF16 data.


If a token was read in the matched this constant, then you must assume that all of the following data is Little Endian. It adheres to the Unicode standard for UTF-16