BurgerLib
Public Types | Static Public Member Functions
Burger::UTF32 Class Reference

Conversion routines to the UTF32 format. More...

#include <ststring.h>

List of all members.

Public Types

enum  { BAD = -1, ENDIANMARK = 0xFEFF, BE = 0xFFFE0000, LE = 0xFEFF }

Static Public Member Functions

static Word BURGER_API IsValid (Word32 Input)
 Validate a UTF32 value.
static Word BURGER_API IsValid (const Word32 *pInput)
 Check a UTF32 "C" string for validity.
static Word BURGER_API IsValid (const Word32 *pInput, WordPtr uInputSize)
 Check a UTF32 Word32 array for validity.
static Word32 BURGER_API FromUTF8 (const char *pInput)
 Return a UTF32 code from a UTF8 stream.
static Word BURGER_API FromUTF8 (Word32 *pOutput, WordPtr uOutputSize, const char *pInput)
 Convert a UTF8 "C" string into a UTF32 stream.
static Word BURGER_API FromUTF8 (Word32 *pOutput, WordPtr uOutputSize, const char *pInput, WordPtr uInputSize)
 Convert a UTF8 stream into a UTF32 Word32 array.

Detailed Description

Conversion routines to the UTF32 format.

UTF32 is simplest data format for Unicode data to be stored in a 32 bit wide "C" string. It can easily contain all of the characters for the worlds languages. These functions allow conversion from UTF8, which Burgerlib is based on, to UTF32 which some foreign APIs require for internationalization. Please note that these functions operate on strings that are native endian.


Member Enumeration Documentation

anonymous enum
Enumerator:
BAD 

Value returned if a routine failed.

If a function doesn't return true or false for failure, it will return this value instead. Please see the documentation for each function to know which ones use true/false pairs or this value.

ENDIANMARK 

Byte stream token for native endian.

When writing a text file using UTF32, you may need to write this value as the first character to mark the endian that the data was saved at. This value is the correct value for the native endian of the machine. Use Burger::UTF32::BE or Burger::UTF32::LE to test incoming data to determine the endian of data that's unknown.

BE 

32 bit token for Big Endian UTF16 data.

If a token was read in the matched this constant, then you must assume that all of the following data is Big Endian.

LE 

32 bit token for Little Endian UTF16 data.

If a token was read in the matched this constant, then you must assume that all of the following data is Little Endian.


Member Function Documentation

Word32 BURGER_API Burger::UTF32::FromUTF8 ( const char *  pInput) [static]

Return a UTF32 code from a UTF8 stream.

Convert from a UTF8 stream into a 32 bit Unicode value (0x00 to 0x10FFFF). This function will perform validation on the incoming stream and will flag any data that's invalid.

Note:
This function will not move the pointer forward, use Burger::UTF8::NextToken(const char *) instead.
Parameters:
pInputPointer to a valid UTF8 "C" string. NULL will page fault.
Returns:
The UTF32 code or Burger::UTF32::BAD if invalid. 0x00 is not invalid.
See also:
Burger::UTF8::GetTokenSize(const char *) or Burger::UTF8::NextToken(const char *).
Word BURGER_API Burger::UTF32::FromUTF8 ( Word32 pOutput,
WordPtr  uOutputSize,
const char *  pInput 
) [static]

Convert a UTF8 "C" string into a UTF32 stream.

Take a "C" string that is using UTF8 encoding and convert it to a UTF32 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be null to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note:
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
If invalid UTF8 data is found, it will be skipped.
Parameters:
pOutputPointer to UTF8 buffer to receive the converted string. NULL is okay if uOutputSize is zero, otherwise it will page fault.
uOutputSizeSize of the output buffer in bytes.
pInputUTF32 encoded "C" string. NULL will page fault.
Returns:
The number of bytes of the potential output without the trailing Word16 zero. It is valid, even if the output buffer wasn't large enough to contain everything.
Word BURGER_API Burger::UTF32::FromUTF8 ( Word32 pOutput,
WordPtr  uOutputSize,
const char *  pInput,
WordPtr  uInputSize 
) [static]

Convert a UTF8 stream into a UTF32 Word32 array.

Take a byte array that is using UTF8 encoding and convert it to a UTF32 Word32 encoded "C" string. The function will return the size of the string after encoding. This size is valid, even if it exceeded the output buffer size. The output pointer and size can be null to have this routine calculate the size of the possible output so the application can allocate a buffer large enough to hold it.

Note:
This function will ensure that the string is always zero terminated, even if truncation is necessary to get it to fit in the output buffer. Under no circumstances will the output buffer be overrun.
Zeros can be encoded into the stream. This function will not early out if a zero was parsed. Zeros will be placed in the UTF32 stream as is.
Parameters:
pOutputPointer to a byte buffer to receive the UTF32 string. NULL is okay if uOutputSize is zero, outwise a page fault will occur.
uOutputSizeSize of the output buffer in bytes.
pInputUTF8 encoded byte array. NULL is okay if uInputSize is zero.
uInputSizeSize of the input byte array.
Returns:
strlen() of the potential output. It is valid, even if the output buffer wasn't large enough to contain everything.

Validate a UTF32 value.

Return TRUE if a UTF32 character is in the valid bounds. (0-0xD7FF) or (0xE000-0x10FFFF).

Parameters:
InputUTF32 encoded character value.
Returns:
TRUE if in bounds, FALSE if not.
Word BURGER_API Burger::UTF32::IsValid ( const Word32 pInput) [static]

Check a UTF32 "C" string for validity.

Check a "C" string if it's a valid UTF32 stream. Return false if there was an error, or true if the bytes represent a valid UTF32 pattern.

Parameters:
pInputPointer to a zero terminated string. NULL will page fault.
Returns:
true if the entire string is a valid UTF32 stream, false if not.
Word BURGER_API Burger::UTF32::IsValid ( const Word32 pInput,
WordPtr  uInputSize 
) [static]

Check a UTF32 Word32 array for validity.

Check a Word32 array and see if it's a valid UTF32 stream. Return false if there was an error, or true if the bytes represent a valid UTF32 pattern.

Parameters:
pInputPointer to UTF32 data. Can be NULL if uInputSize is zero, otherwise page fault.
uInputSizeLength of the data in bytes, if zero, then the function will return true. If the length is odd, the low bit will be masked off to force it even.
Returns:
true if the entire string is a valid UTF8 stream, false if not.