Intermec PC43d Fingerprint Developer's Guide (PC23d, PC43d/t, PM23c, PM43, PM4 - Page 147

About the UTF-8 Character Set

Page 147 highlights

Appendix A - Character Sets and Keywords About the UTF-8 Character Set The UTF-8 character set was created to encode all Unicode characters while maintaining compatibility with the US-ASCII (0 to 127 dec.) range of characters. Data is encoded with 1, 2, 3 or 4 bytes, depending on the character number range. The table below shows the UTF-8 binary sequences corresponding to the Unicode character number. Unicode character number range Hex 0000-007F Binary x7x6x5x4x3x2x1 0080-07FF y5y4y3y2y1x6x5x4x3x2x1 0800-FFFF z4z3z2z1y6y5y4y3y2y1x6x5x4x3x2x1 010000-10FFFF UTF-8 Byte sequence Binary One byte: 0x7x6x5x4x3x2x1 Two bytes: 110y5y4y3y2y1 10x6x5x4x3x2x1 Three bytes: 1110z4z3z2z1 10y6y5y4y3y2y110 x6x5x4x3x2x1 Four bytes: Not currently supported. Follow the next procedure to convert a Unicode character code in hex format to the UTF-8 byte decimal value necessary to print the characters. To convert a hex format Unicode character code to a decimal value 1 Determine the Unicode hex value for the character. For example, the hex value for the Cyrillic capital letter ZHE ( ) is 0416. 2 Based on the hex value, determine the number of bytes required for UTF-8 encoding: Hex value of character 0000 to 007F 0080 to 07FF 0800 to FFFF Number of bytes required One Two Three Using the same example, a hex value of 0416 requires two bytes for UTF-8 encoding. 3 Convert the hex value to binary. Using the same example, a hex value of 0416 equals the binary value 10000010110. 4 Identify x, y, and z bits as applicable. Start with the least significant digits to the right and pad with zeros to the left if necessary. In this example, the first five digits of the binary value 10000010110 correspond to the y bits, and the remaining six digits correspond to the x bits. No padding zeros are necessary. The first byte is 11010000. The second byte is 10010110. Fingerprint Developer's Guide 135

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163

Appendix A — Character Sets and Keywords
Fingerprint Developer’s Guide
135
About the UTF-8 Character Set
The UTF-8 character set was created to encode all Unicode characters while
maintaining compatibility with the US-ASCII (0 to 127 dec.) range of characters.
Data is encoded with 1, 2, 3 or 4 bytes, depending on the character number range.
The table below shows the UTF-8 binary sequences corresponding to the Unicode
character number.
Follow the next procedure to convert a Unicode character code in hex format to the
UTF-8 byte decimal value necessary to print the characters.
To convert a hex format Unicode character code to a decimal value
1
Determine the Unicode hex value for the character. For example, the hex value
for the Cyrillic capital letter ZHE (
) is 0416.
2
Based on the hex value, determine the number of bytes required for UTF-8
encoding:
Using the same example, a hex value of 0416 requires two bytes for UTF-8
encoding.
3
Convert the hex value to binary. Using the same example, a hex value of 0416
equals the binary value 10000010110.
4
Identify
x
,
y
, and
z
bits as applicable. Start with the least significant digits to the
right and pad with zeros to the left if necessary.
In this example, the first five digits of the binary value 10000010110 correspond
to the
y
bits, and the remaining six digits correspond to the
x
bits. No padding
zeros are necessary.
The first byte is 11010000.
The second byte is 10010110.
Unicode character number range
UTF-8 Byte sequence
Hex
Binary
Binary
0000-007F
x
7
x
6
x
5
x
4
x
3
x
2
x
1
One byte:
0
x
7
x
6
x
5
x
4
x
3
x
2
x
1
0080-07FF
y
5
y
4
y
3
y
2
y
1
x
6
x
5
x
4
x
3
x
2
x
1
Two bytes:
110
y
5
y
4
y
3
y
2
y
1
10
x
6
x
5
x
4
x
3
x
2
x
1
0800-FFFF
z
4
z
3
z
2
z
1
y
6
y
5
y
4
y
3
y
2
y
1
x
6
x
5
x
4
x
3
x
2
x
1
Three bytes:
1110
z
4
z
3
z
2
z
1
10
y
6
y
5
y
4
y
3
y
2
y
1
10
x
6
x
5
x
4
x
3
x
2
x
1
010000-10FFFF
Four bytes: Not currently supported.
Hex value of character
Number of bytes required
0000 to 007F
One
0080 to 07FF
Two
0800 to FFFF
Three