A-level Computing/AQA/Fundamentals of data representation/ASCII and unicode

PAPER 2 - ⇑ Fundamentals of data representation ⇑

Character form of decimal digit ASCII and unicode Error checking
Category:Book:A-level Computing#AQA/Fundamentals%20of%20data%20representation/ASCII%20and%20unicode
The 104-key PC US English QWERTY keyboard layout evolved from the standard typewriter keyboard, with extra keys for computing.

ASCII normally uses 8 bits (1 byte) to store each character. However, the 8th bit is used as a check digit, meaning that only 7 bits are available to store each character. This gives ASCII the ability to store a total of

2^7 = 128 different values.
The 95 printable ASCII characters, numbered from 32 to 126 (decimal)

ASCII values can take many forms:

  • Numbers
  • Letters (capitals and lower case are separate)
  • Punctuation (?/|\£$ etc.)
  • non-printing commands (enter, escape, F1)

Take a look at your keyboard and see how many different keys you have. The number should be 104 for a windows keyboard, or 101 for traditional keyboard. With the shift function valus (a, A; b, B etc.) and recognising that some keys have repeated functionality (two shift keys, the num pad). We roughly have 128 functions that a keyboard can perform.

BinaryDecHexAbbr
000 0000000 NUL
000 0001101 SOH
000 0010202 STX
000 0011303 ETX
000 0100404 EOT
000 0101505 ENQ
000 0110606 ACK
000 0111707 BEL
000 1000808 BS
000 1001909 HT
000 1010100A LF
000 1011110B VT
000 1100120C FF
000 1101130D CR
000 1110140E SO
000 1111150F SI
001 00001610 DLE
001 00011711 DC1
001 00101812 DC2
001 00111913 DC3
001 01002014 DC4
001 01012115 NAK
001 01102216 SYN
001 01112317 ETB
001 10002418 CAN
001 10012519 EM
001 1010261A SUB
001 1011271B ESC
001 1100281C FS
001 1101291D GS
001 1110301E RS
001 1111311F US
111 11111277F DEL
BinaryDecHexGlyph
010 00003220?
010 00013321!
010 00103422"
010 00113523#
010 01003624$
010 01013725%
010 01103826&
010 01113927'
010 10004028(
010 10014129)
010 1010422A*
010 1011432B+
010 1100442C,
010 1101452D-
010 1110462E.
010 1111472F/
011 000048300
011 000149311
011 001050322
011 001151333
011 010052344
011 010153355
011 011054366
011 011155377
011 100056388
011 100157399
011 1010583A:
011 1011593B;
011 1100603C<
011 1101613D=
011 1110623E>
011 1111633F?
BinaryDecHexGlyph
100 00006440@
100 00016541A
100 00106642B
100 00116743C
100 01006844D
100 01016945E
100 01107046F
100 01117147G
100 10007248H
100 10017349I
100 1010744AJ
100 1011754BK
100 1100764CL
100 1101774DM
100 1110784EN
100 1111794FO
101 00008050P
101 00018151Q
101 00108252R
101 00118353S
101 01008454T
101 01018555U
101 01108656V
101 01118757W
101 10008858X
101 10018959Y
101 1010905AZ
101 1011915B[
101 1100925C\
101 1101935D]
101 1110945E^
101 1111955F_
BinaryDecHexGlyph
110 00009660`
110 00019761a
110 00109862b
110 00119963c
110 010010064d
110 010110165e
110 011010266f
110 011110367g
110 100010468h
110 100110569i
110 10101066Aj
110 10111076Bk
110 11001086Cl
110 11011096Dm
110 11101106En
110 11111116Fo
111 000011270p
111 000111371q
111 001011472r
111 001111573s
111 010011674t
111 010111775u
111 011011876v
111 011111977w
111 100012078x
111 100112179y
111 10101227Az
111 10111237B{
111 11001247C|
111 11011257D}
111 11101267E~

If you look carefully at the ASCII representation of each character you might notice some patterns. For example:

BinaryDecHexGlyph
110 00019761a
110 00109862b
110 00119963c

As you can see, a = 97, b = 98, c = 99. This means that if we are told what value a character is we can easily work out the value of subsequent or prior characters.

Example: ASCII characters

Without looking at the ASCII table above! If we are told that the ASCII value for the character '5' is 011 0101, what is the ASCII value for '8'.

We know that '8' is three characters after '5', as 5,6,7,8. This means that the ASCII value of '8' will be three bigger than that for '5':

  011 0101  ASCII '5'
+      011
  --------  
  011 1000  ASCII '8'

Checking above this is the correct value.

If you are worried about making mistakes with binary addition, you can deal with the decimal numbers instead. Take the example where you are given the ASCII value of 'g', 110 0111, what is 'e'?

We know that 'e' is two characters before 'g', as e, f, g. This means that the ASCII value of 'e' will be two smaller than that for 'g'.

64 32 16  8  4  2  1
 1  1  0  0  1  1  1 = 10310 = ASCII value of 'g'

103 - 2 = 10110

64 32 16  8  4  2  1
 1  1  0  0  1  0  1 = 10110 = ASCII value of 'e'
Exercise: ASCII

Without using the crib table (you won't get it in the exam!) answer the following questions:

The ASCII code for the letter 'Z' is 90(base10), what is the letter 'X' stored as

Answer:

88 - as it is 2 characters down in the alphabet

How many ASCII 'characters' does the following piece of text use:

Hello Pete,
ASCII rocks!

Answer:

27 or 26. If you said 23 you'd be wrong because you must include the non-printing characters at the end of each line. Each end of line needs a EOL command, and a new line needs a carriage return (CR), making the text like so:

Hello Pete,[EOL][CR]
ASCII rocks![EOL]

For the Latin alphabet ASCII is generally fine, but what if you wanted to write something in Mandarin, or Hindi? We need another coding scheme!

Extension: Coding ASCII

You might have to use ASCII codes when reading from text files. To see what each ASCII code means we can use the folliwing function ChrW(x) which returns the ASCII code with denary value x. Try out the following code to see the first 128 characters. What is special about character 10?

For x = 0 To 127
  Console.WriteLine("ASCII for " & x & " = " & ChrW(x))
Next
Console.ReadLine()
Category:Book:A-level Computing#AQA/Fundamentals%20of%20data%20representation/ASCII%20and%20unicode%20
Category:Book:A-level Computing