Intermec PB51 Fingerprint Developer's Guide (old) - Page 171

About the UTF-8 Character Set

Page 171 highlights

Appendix B - Character Sets and Keywords About the UTF-8 Character Set The UTF-8 character set was devised to encode all Unicode characters, maintaining compatibility with the US-ASCII (0 to 127 dec.) range of characters. Data is encoded with 1, 2, 3 or 4 bytes, depending on the character number range. The table below shows the UTF-8 binary sequences corresponding to the Unicode character number. Unicode character number range Hex 0000-007F Binary x7x6x5x4x3x2x1 0080-07FF y5y4y3y2y1x6x5x4x3x2x1 0800-FFFF z4z3z2z1y6y5y4y3y2y1x6x5x4x3x2x1 010000-10FFFF UTF-8 Byte sequence Binary One byte: 0x7x6x5x4x3x2x1 Two bytes: 110y5y4y3y2y1 10x6x5x4x3x2x1 Three bytes: 1110z4z3z2z1 10y6y5y4y3y2y110 x6x5x4x3x2x1 Four bytes: Not currently supported. Follow the next procedure to convert a Unicode character code in hex format to the UTF-8 byte decimal value necessary to print the characters. To convert a hex format Unicode character code to a decimal value 1 Determine the Unicode hex value for the character. For example, the hex value for the Cyrillic capital letter ZHE ( ) is 0416. 2 Based on the hex value, determine the number of bytes required for UTF-8 encoding: Hex value of character 0000 to 007F 0080 to 07FF 0800 to FFFF Number of bytes required One Two Three Using the same example, a hex value of 0416 requires two bytes for UTF-8 encoding. 3 Convert the hex value to binary. Using the same example, a hex value of 0416 equals the binary value 10000010110. 4 Identify x, y, and z bits as applicable. Start with the least significant digits to the right and pad with zeros to the left if necessary. In this example, the first five digits of the binary value 10000010110 correspond to the y bits, and the remaining six digits correspond to the x bits. No padding zeros are necessary. The first byte is 11010000. The second byte is 10010110. 5 Convert the bytes to decimal format. Using this example, the byte value 11010000 equals a decimal value of 208, and the byte value 10010110 equals a decimal value of 150. Intermec Fingerprint Developer's Guide 155

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187

Appendix B — Character Sets and Keywords
Intermec Fingerprint Developer’s Guide
155
About the UTF-8 Character Set
The UTF-8 character set was devised to encode all Unicode characters, maintaining
compatibility with the US-ASCII (0 to 127 dec.) range of characters. Data is encoded
with 1, 2, 3 or 4 bytes, depending on the character number range. The table below
shows the UTF-8 binary sequences corresponding to the Unicode character number.
Follow the next procedure to convert a Unicode character code in hex format to the
UTF-8 byte decimal value necessary to print the characters.
To convert a hex format Unicode character code to a decimal value
1
Determine the Unicode hex value for the character. For example, the hex value
for the Cyrillic capital letter ZHE (
) is 0416.
2
Based on the hex value, determine the number of bytes required for UTF-8
encoding:
Using the same example, a hex value of 0416 requires two bytes for UTF-8
encoding.
3
Convert the hex value to binary. Using the same example, a hex value of 0416
equals the binary value 10000010110.
4
Identify
x
,
y
, and
z
bits as applicable. Start with the least significant digits to the
right and pad with zeros to the left if necessary.
In this example, the first five digits of the binary value 10000010110 correspond
to the
y
bits, and the remaining six digits correspond to the
x
bits. No padding
zeros are necessary.
The first byte is 11010000.
The second byte is 10010110.
5
Convert the bytes to decimal format. Using this example, the byte value
11010000 equals a decimal value of 208, and the byte value 10010110 equals a
decimal value of 150.
Unicode character number range
UTF-8 Byte sequence
Hex
Binary
Binary
0000-007F
x
7
x
6
x
5
x
4
x
3
x
2
x
1
One byte:
0
x
7
x
6
x
5
x
4
x
3
x
2
x
1
0080-07FF
y
5
y
4
y
3
y
2
y
1
x
6
x
5
x
4
x
3
x
2
x
1
Two bytes:
110
y
5
y
4
y
3
y
2
y
1
10
x
6
x
5
x
4
x
3
x
2
x
1
0800-FFFF
z
4
z
3
z
2
z
1
y
6
y
5
y
4
y
3
y
2
y
1
x
6
x
5
x
4
x
3
x
2
x
1
Three bytes:
1110
z
4
z
3
z
2
z
1
10
y
6
y
5
y
4
y
3
y
2
y
1
10
x
6
x
5
x
4
x
3
x
2
x
1
010000-10FFFF
Four bytes: Not currently supported.
Hex value of character
Number of bytes required
0000 to 007F
One
0080 to 07FF
Two
0800 to FFFF
Three