Serial Communication Data Formats
Data bits in a serial transmission may consist of following type of information
- Sensor readings
- Status information
- Error codes
- Configuration data
- Files that contain text
- Executable code or other information
In general, software that manages serial-port communications treats the data being transmitted as either binary or text data.
With binary data, each transmitted byte is a value from 00h to FFh. The bits in each byte are numbered 0 through 7, with each bit representing the bit’s value (0 or 1) multiplied by a power of 2. For example, a byte of 11111111b translates to FFh, or 255. A byte of 00010001b translates to 11h, or 17.
In asynchronous links, bit zero, the least-significant bit (LSb), arrives first. A serial port can transmit values that have 16, 32, or any number of bits by dividing the value into bytes and transmitting the bytes in sequence. For multi-byte values, the transmitting and receiving computers must agree on the order that the bytes transmit.
Some software treats data as text, with each text character expressed as a code. Treating data as text is an ease for programmers. Most programming languages enable storing text in two ways:
- As strings: Strings can contain one or more characters. Source code can request to transmit a string such as “hello” and the compiler or interpreter generates code that causes the individual character codes to transmit
- As arrays: Arrays are made up of individual characters Source code can request to transmit an array, with each item in the array containing a character code.
At the receiving computer, software can store received character codes in a string or array. For applications that send and receive only basic U.S. English text, encoding generally isn’t an issue. When sending or receiving special characters or characters in other alphabets or scripts, the computers must agree on an encoding method.
The .NET Framework and other recent software use encoding methods defined in The Unicode Standard (a publication of the Unicode Consortium). Unicode’s encoding methods support over a million characters in dozens of alphabets and other scripts, plus punctuation marks, math and technical symbols, geometric shapes, and other symbols. Unicode encodes text using
- Code points: A code point is a value that identifies a character. Unicode code charts assign a code point to each character. The conventional notation for code points is the form U+code_ point, where code_ point is a hexadecimal value. For example, the code point for “A” is U+0041. Code points range from U+0000 to U+10FFFF.Each character has one and only one code point. Software uses the code point to obtain the encoded character, which represents a character using a specific coding method. The code point and encoded character can have the same value or different values depending on the encoding method.
- Code units: An encoded character that represents a character in software consists of one or more values called code units. A character’s code point never changes, but the code units that make up an encoded character vary with the encoding method. The number of code units that represent a character, their value(s), and the number of bits in the code units vary with the character and encoding method.
Unicode encoding methods
Following are the three basic Unicode encoding methods:
Each can encode any character that has a defined code point. The encoding methods use different algorithms to convert code points into code units.
UTF-8 encoding uses 8-bit code units, and a UTF-8 encoded character is 1 to 4 code units wide. Basic U.S. English text can use UTF-8 encoding with each character encoded as a single code unit whose value equals the lower byte of the character’s code point.
UTF-16 encoding uses 16-bit code units, and UTF-16 encoded characters are 1 or 2 code units each.UTF-16 encoding represents more than 60,000 characters as single code units whose values equal the characters’ code points.
UTF-32 encoding uses 32-bit code units. A UTF-32 encoded character is always a single code unit. A UTF-32 code unit always has the same value as the character’s code point.
The UTF-16 and UTF-32 methods have alternate forms to enable storing code units as big endian (storing the most significant byte first in memory) or little endian (storing the least significant byte first in memory). The unmarked forms (UTF-16, UTF-32) are big endian unless the data is preceded by a byte-order mark (FEFFh).
Treating data as text is the evident choice for transferring strings or files that contain text. But one can also use text to transfer binary data by expressing the data in ASCII Hex format. Each byte is represented by a pair of ASCII codes that represent the byte’s two hexadecimal characters, which are in the range 0–9 and A–F. ASCII Hex can represent any numeric value using only the ASCII codes 30h–39h (to represent values 00h–09h) and 41h–46h (to represent values 0Ah–0Fh). Code that allows lower-case letters might also use 61h–66h (to represent 0ah–0fh). Instead of sending one byte to represent a value from 0 to 255, the transmitting computer sends two bytes, one for each character in the hex number that represents the byte. The receiving computer can convert the characters to numeric values or use the data in any way.
For example, consider the decimal number: 225, Expressed as a binary number, it’s: 11100001
In hexadecimal, it’s: E1.
The ASCII codes for “E” and “1” are: 45h 31h.
So the binary representation of this value in ASCII hex consists of these two bytes: 01000101 00110001 .
A serial link using ASCII Hex format would send the decimal value 225 by transmitting the two bytes above.
- Each byte value requires two characters so data takes twice as long to transfer.
- Also, in most cases the application at each end must convert between ASCII hex and binary.
- One of the major reasons to use ASCII Hex is to free all of the other codes for other uses, such as flow-control codes, an end-of-file indicator, or network addresses.
- ASCII Hex also allows protocols that support only seven data bits to transmit any numeric value.