Representing Data
IB Syllabus: A1.2.2 — Explain how binary can be used to represent different types of data, including integers, characters, images, audio, and video.
Table of Contents
- Key Concepts
- Worked Examples
- Quick Code Check
- Trace Exercise
- Spot the Error
- Fill in the Blanks
- Predict the Output
- Practice Exercises
- Connections
Key Concepts
Characters (A1.2.2)
Every character you type on a keyboard — letters, digits, punctuation, even the space bar — must be stored as a binary number inside the computer. A character encoding is the agreed-upon mapping between characters and numbers.
ASCII (American Standard Code for Information Interchange)
ASCII is the oldest and most foundational character encoding. It uses 7 bits to represent 128 different characters (2^7 = 128).
Key ASCII values to remember:
| Character | Decimal | Binary (7-bit) |
|---|---|---|
'0' (digit zero) | 48 | 0110000 |
'A' (uppercase A) | 65 | 1000001 |
'Z' (uppercase Z) | 90 | 1011010 |
'a' (lowercase a) | 97 | 1100001 |
'z' (lowercase z) | 122 | 1111010 |
| Space | 32 | 0100000 |
Important patterns:
- Uppercase letters A-Z occupy decimal values 65-90
- Lowercase letters a-z occupy decimal values 97-122
- The difference between uppercase and lowercase of the same letter is always 32 (e.g., ‘A’=65, ‘a’=97)
- Digit characters ‘0’-‘9’ occupy decimal values 48-57
Extended ASCII uses 8 bits (1 full byte) to represent 256 characters (2^8 = 256). The extra 128 characters include accented letters, line-drawing symbols, and other special characters. Extended ASCII is backward-compatible with standard ASCII — the first 128 characters are identical.
Unicode
ASCII was designed for English and cannot represent characters from other writing systems — Chinese, Arabic, Hindi, Japanese, Korean, and many more. Unicode was created to solve this problem.
- Unicode assigns a unique code point to every character in every writing system
- Supports over 143,000 characters across 150+ scripts, plus emoji
- Three main encoding formats:
| Encoding | Bytes per Character | Notes |
|---|---|---|
| UTF-8 | 1 to 4 bytes (variable) | Backward-compatible with ASCII; most common on the web |
| UTF-16 | 2 or 4 bytes | Used internally by Java and Windows |
| UTF-32 | 4 bytes (fixed) | Simplest but most wasteful of storage |
UTF-8 is the dominant encoding on the internet because:
- ASCII characters (English text) use just 1 byte each — identical to ASCII
- Characters from other scripts use 2-4 bytes as needed
- This makes it storage-efficient for text that is mostly English, while still supporting every character in existence
Converting a Character to Binary
To convert a character to binary:
- Look up the character’s decimal value in the ASCII table
- Convert the decimal value to binary
- Pad with leading zeros to reach the required number of bits (typically 8 for extended ASCII)
Example: Convert ‘B’ to 8-bit binary.
- ‘B’ = 66 in ASCII
- 66 in binary: 64 + 2 = 1000010
- Pad to 8 bits: 01000010
Integers (A1.2.2)
Computers represent whole numbers (integers) in binary. There are several methods depending on whether negative numbers need to be represented.
Unsigned Binary
Unsigned binary represents only non-negative integers (0 and positive). With n bits, you can represent values from 0 to 2^n - 1.
- 8-bit unsigned: 0 to 255
- 16-bit unsigned: 0 to 65,535
This is covered in detail on the Number Systems page.
Sign-and-Magnitude
Sign-and-magnitude uses the leftmost bit (most significant bit) as a sign bit:
0= positive1= negative
The remaining bits represent the magnitude (absolute value).
Example (8-bit):
00000101= +5 (sign bit 0 = positive, magnitude = 5)10000101= -5 (sign bit 1 = negative, magnitude = 5)
Limitation: Sign-and-magnitude has two representations for zero (00000000 = +0 and 10000000 = -0), which complicates arithmetic circuits.
Two’s Complement
Two’s complement is the most widely used method for representing signed integers in computers. It solves the problems of sign-and-magnitude.
- Only one representation for zero (
00000000) - Arithmetic operations (addition, subtraction) work without special sign-handling logic
- With n bits, range is -2^(n-1) to 2^(n-1) - 1
- 8-bit two’s complement: -128 to +127
To find the two’s complement (negate a number):
- Write the positive number in binary
- Flip all bits (0 becomes 1, 1 becomes 0) — this is the “one’s complement”
- Add 1 to the result
Example: Represent -6 in 8-bit two’s complement.
- +6 in binary:
00000110 - Flip all bits:
11111001 - Add 1:
11111010 - So -6 =
11111010
Enrichment: For a deeper explanation of binary conversions and two’s complement worked examples, see the Number Systems page.
Images (A1.2.2)
Digital images are made up of tiny coloured dots called pixels (picture elements). Understanding how images are stored requires knowing three key concepts: resolution, colour depth, and the RGB model.
Pixels and Resolution
A pixel is the smallest addressable element of a digital image. Each pixel is a single colour.
Resolution is the total number of pixels in an image, expressed as width x height. For example, a Full HD image has a resolution of 1920 x 1080 pixels (approximately 2.07 million pixels, or 2.07 megapixels).
Higher resolution means more detail but also larger file sizes.
Colour Depth
Colour depth (also called bit depth) is the number of bits used to represent the colour of each pixel. More bits per pixel means more possible colours.
| Colour Depth | Number of Colours | Common Use |
|---|---|---|
| 1-bit | 2 (black and white) | Simple icons, fax documents |
| 8-bit | 256 | GIF images, indexed colour |
| 16-bit | 65,536 | “High colour” |
| 24-bit | 16,777,216 | “True colour” — photographs |
| 32-bit | 16,777,216 + transparency | True colour with alpha channel |
The RGB Colour Model
The RGB model represents colours by combining three channels: Red, Green, and Blue. Each channel typically uses 8 bits (values 0-255), giving a total of 24 bits per pixel.
- (255, 0, 0) = pure red
- (0, 255, 0) = pure green
- (0, 0, 255) = pure blue
- (255, 255, 255) = white
- (0, 0, 0) = black
- (255, 255, 0) = yellow (red + green)
Bitmap Storage
In a bitmap image, every pixel’s colour value is stored individually. The image is essentially a grid of numbers.
Here is a simple example of a 4x4 pixel image with RGB values:
Pixel Grid (4x4):
+----------------+----------------+----------------+----------------+
| (255, 0, 0) | (255, 0, 0) | ( 0, 0,255) | ( 0, 0,255) |
| RED | RED | BLUE | BLUE |
+----------------+----------------+----------------+----------------+
| (255, 0, 0) | (255, 0, 0) | ( 0, 0,255) | ( 0, 0,255) |
| RED | RED | BLUE | BLUE |
+----------------+----------------+----------------+----------------+
| ( 0,255, 0) | ( 0,255, 0) | (255,255, 0) | (255,255, 0) |
| GREEN | GREEN | YELLOW | YELLOW |
+----------------+----------------+----------------+----------------+
| ( 0,255, 0) | ( 0,255, 0) | (255,255, 0) | (255,255, 0) |
| GREEN | GREEN | YELLOW | YELLOW |
+----------------+----------------+----------------+----------------+
This 4x4 image contains 16 pixels, each stored as three bytes (24 bits), for a total of 16 x 3 = 48 bytes.
Image File Size Calculation
The formula for calculating uncompressed bitmap image file size:
File size (bits) = width x height x colour depth
To convert to bytes, divide by 8. To convert to kilobytes, divide by 1,024. To convert to megabytes, divide by 1,024 again.
Audio (A1.2.2)
Enrichment — Analog vs Digital: Sound in the real world is a continuous analog signal — a smooth wave of air pressure variations. Computers cannot store continuous signals; they can only store discrete digital values (individual numbers). Converting analog sound to digital data requires sampling — measuring the wave at regular intervals and storing each measurement as a number.
Sampling
Sampling is the process of measuring the amplitude (height) of an analog sound wave at regular time intervals. Each measurement is called a sample.
The more frequently you sample, the more accurately the digital version represents the original analog sound — but the more storage space it requires.
Sample Rate
The sample rate (also called sampling frequency) is the number of samples taken per second, measured in Hertz (Hz).
| Sample Rate | Quality Level | Common Use |
|---|---|---|
| 8,000 Hz | Telephone quality | Voice calls |
| 22,050 Hz | FM radio quality | Podcasts, speech |
| 44,100 Hz | CD quality | Music CDs, high-quality audio |
| 48,000 Hz | Professional quality | DVD, professional recording |
| 96,000 Hz | Studio quality | High-resolution audio |
CD quality uses a sample rate of 44,100 Hz, meaning the sound wave is measured 44,100 times every second.
The Nyquist theorem states that to accurately capture a sound, the sample rate must be at least twice the highest frequency in the audio. Human hearing ranges up to about 20,000 Hz, so CD quality (44,100 Hz) captures all audible frequencies.
Bit Depth
Bit depth (also called sample size or resolution) is the number of bits used to store each sample. More bits means more possible amplitude levels and greater precision.
| Bit Depth | Amplitude Levels | Common Use |
|---|---|---|
| 8-bit | 256 | Low quality, early games |
| 16-bit | 65,536 | CD quality |
| 24-bit | 16,777,216 | Professional studio |
CD-quality audio uses 16-bit depth, meaning each sample can be one of 65,536 different amplitude levels.
Channels
Audio can be recorded in:
- Mono (1 channel) — single audio stream
- Stereo (2 channels) — separate left and right audio streams
- Surround sound (5.1 = 6 channels, 7.1 = 8 channels)
Stereo requires twice the storage of mono because every sample is recorded independently for both left and right channels.
PCM (Pulse Code Modulation)
PCM is the standard method for encoding uncompressed digital audio. It stores the raw sampled values directly — each sample is a binary number representing the amplitude at that moment in time.
PCM is used in:
- Audio CDs (.cda)
- WAV files (.wav)
- AIFF files (.aiff)
These are lossless but large. Compressed formats like MP3 and AAC reduce the size by applying lossy compression (see the Compression page).
Audio File Size Calculation
The formula for calculating uncompressed audio file size:
File size (bits) = sample rate x bit depth x channels x duration (seconds)
To convert to bytes, divide by 8. To convert to megabytes, divide by 1,024 twice.
Video (A1.2.2)
Video is a sequence of image frames displayed in rapid succession, creating the illusion of motion. Digital video also includes an audio track synchronised with the images.
Frame Rate
The frame rate measures how many image frames are displayed per second, measured in frames per second (fps).
| Frame Rate | Use |
|---|---|
| 24 fps | Cinema / film |
| 25 fps | PAL television (Europe, Asia) |
| 30 fps | NTSC television (Americas, Japan) |
| 60 fps | Smooth motion, gaming, sports |
| 120+ fps | Slow-motion capture, VR |
Higher frame rates produce smoother motion but require more storage and processing power.
Video File Size Calculation
An uncompressed video file combines image data and audio data:
Video file size = (image size per frame x frame rate x duration) + audio size
Where image size per frame = width x height x colour depth / 8 (in bytes).
Why Compression Is Essential for Video
Uncompressed video produces enormous file sizes. Consider a single minute of 1080p video at 30 fps:
- Each frame: 1920 x 1080 x 24 / 8 = 6,220,800 bytes (~5.93 MB)
- 30 frames per second x 60 seconds = 1,800 frames
- Total: 6,220,800 x 1,800 = 11,197,440,000 bytes (approximately 10.43 GB)
That is over 10 GB for just one minute of video — without audio! At this rate, a 2-hour film would require over 1.2 terabytes of storage.
This is why video always uses compression in practice. Modern codecs like H.264 and H.265 can compress video by 100x or more while maintaining good visual quality. Video compression techniques are explored on the Compression page.
Summary Table
| Data Type | Key Parameters | File Size Formula |
|---|---|---|
| Image | Width, height, colour depth | width x height x colour depth / 8 |
| Audio | Sample rate, bit depth, channels, duration | sample rate x bit depth x channels x duration / 8 |
| Video | Resolution, colour depth, frame rate, duration | (width x height x colour depth / 8) x frame rate x duration |
Worked Examples
Example 1: Encode “HELLO” in ASCII
Problem: Encode the string “HELLO” in 8-bit ASCII binary.
Step 1: Look up each character’s ASCII decimal value.
| Character | ASCII Decimal |
|---|---|
| H | 72 |
| E | 69 |
| L | 76 |
| L | 76 |
| O | 79 |
Step 2: Convert each decimal value to 8-bit binary.
| Character | Decimal | Binary (8-bit) |
|---|---|---|
| H | 72 | 01001000 |
| E | 69 | 01000101 |
| L | 76 | 01001100 |
| L | 76 | 01001100 |
| O | 79 | 01001111 |
Verification of binary conversions:
- H = 72: 64 + 8 = 01001000
- E = 69: 64 + 4 + 1 = 01000101
- L = 76: 64 + 8 + 4 = 01001100
- O = 79: 64 + 8 + 4 + 2 + 1 = 01001111
The string “HELLO” is stored as: 01001000 01000101 01001100 01001100 01001111 (40 bits = 5 bytes)
Example 2: Calculate Image File Size
Problem: Calculate the file size of a 1920 x 1080 image at 24-bit colour depth.
Step 1: Calculate the total number of pixels.
- 1920 x 1080 = 2,073,600 pixels
Step 2: Calculate the total number of bits.
- 2,073,600 x 24 = 49,766,400 bits
Step 3: Convert to bytes (divide by 8).
- 49,766,400 / 8 = 6,220,800 bytes
Step 4: Convert to megabytes (divide by 1,024 twice).
- 6,220,800 / 1,024 = 6,075 KB
- 6,075 / 1,024 = 5.93 MB (approximately)
Example 3: Calculate Audio File Size
Problem: Calculate the file size of 3 minutes of CD-quality stereo audio (44,100 Hz, 16-bit, 2 channels).
Step 1: Convert duration to seconds.
- 3 minutes = 180 seconds
Step 2: Calculate the total number of bits.
- 44,100 x 16 x 2 x 180 = 254,016,000 bits
Step 3: Convert to bytes.
- 254,016,000 / 8 = 31,752,000 bytes
Step 4: Convert to megabytes.
- 31,752,000 / 1,024 = 31,007.81 KB
- 31,007.81 / 1,024 = 30.28 MB (approximately)
Example 4: Why Compression Is Essential for Video
Problem: Calculate the uncompressed file size of 1 minute of 1080p video at 30 fps and explain why compression is essential.
Step 1: Calculate the size of one frame.
- 1920 x 1080 x 24 / 8 = 6,220,800 bytes per frame
Step 2: Calculate the total number of frames.
- 30 frames/sec x 60 sec = 1,800 frames
Step 3: Calculate the total video size (image data only).
- 6,220,800 x 1,800 = 11,197,440,000 bytes
Step 4: Convert to gigabytes.
- 11,197,440,000 / 1,024 / 1,024 / 1,024 = 10.43 GB (approximately)
Conclusion: Just 1 minute of uncompressed 1080p video requires over 10 GB. A full-length 2-hour film would need over 1.2 TB. This makes uncompressed video completely impractical for storage and transmission. Real-world video uses compression (e.g., H.264, H.265) to reduce file sizes by 100x or more while maintaining acceptable visual quality.
Quick Code Check
Q1. What is the ASCII decimal value of the character 'A'?
Q2. A 24-bit colour image uses how many bits for each colour channel (red, green, blue)?
Q3. What does the sample rate of an audio file measure?
Q4. Which factor does NOT directly affect digital image file size?
Q5. Why is uncompressed video impractical for storage and transmission?
Trace Exercise
Encode the string “CAB” in 8-bit ASCII binary. For each character, find the ASCII decimal value and then convert it to 8-bit binary.
| Character | ASCII Decimal | Binary (8-bit) |
|---|---|---|
| C | ||
| A | ||
| B |
Spot the Error
A student calculates the file size of a 30-second stereo audio clip (44,100 Hz, 16-bit). Click the line with the error, then pick the fix.
Pick the correct fix for this line:
Fill in the Blanks
Complete the following statements about data representation:
Character Encoding
==================
ASCII uses bits to represent different characters.
Image Representation
====================
In the RGB colour model, each pixel uses bits per colour channel.
Image file size = width x height x / 8
Audio Representation
====================
Audio file size = sample rate x bit depth x x duration / 8
CD-quality audio uses a sample rate of Hz. Predict the Output
Calculate the file size in megabytes (MB) of an 800 x 600 image at 24-bit colour depth. Round to 2 decimal places.
Width: 800 pixels
Height: 600 pixels
Colour depth: 24 bits per pixel
File size = width x height x colour depth / 8
= ? bytes
= ? MB (round to 2 d.p.)What is the file size in MB?
Practice Exercises
Core
-
Encode “CODE” in ASCII — For each character in “CODE”, give the ASCII decimal value and the 8-bit binary representation. (C=67, O=79, D=68, E=69)
- Image file size — Calculate the file size of a 640 x 480 image at 8-bit colour depth, in kilobytes.
- Hint: 640 x 480 = 307,200 pixels. Multiply by 8 bits, divide by 8 to get bytes, divide by 1,024 to get KB.
- Define key terms — Write a one-sentence definition for each of the following:
- Sample rate
- Bit depth
- Resolution
- Colour depth
- Frame rate
Extension
-
Colour depth comparison — Compare the visual quality difference between 8-bit and 24-bit colour images. How many distinct colours can each represent? When would you choose one over the other?
-
Sample rate trade-offs — Explain why increasing the sample rate of audio improves quality. What is the trade-off? Reference the Nyquist theorem in your answer.
-
Streaming quality — A music streaming service offers “standard quality” (128 kbps) and “high quality” (320 kbps). If a song is 4 minutes long, calculate the file size for each quality setting.
- Hint: kbps = kilobits per second. Convert minutes to seconds, multiply by bit rate, convert to megabytes.
Challenge
-
4K video storage — A streaming service stores 1 hour of 4K video (3840 x 2160) at 60 fps. Estimate the uncompressed file size in terabytes. Then explain why compression is essential and compare lossy vs lossless compression for video — which would you recommend and why?
-
Unicode and global communication — Unicode supports over 143,000 characters across all world scripts. Explain why ASCII’s 128 characters are insufficient for global communication. Compare UTF-8, UTF-16, and UTF-32 in terms of storage efficiency and compatibility. When would each encoding be most appropriate?
Connections
- Prerequisites: Number Systems — must understand binary and hexadecimal first
- Related: Compression — compression reduces the large file sizes calculated here
- Related: Secondary Storage — where this data is physically stored
- Forward: File Processing — reading and writing data in programs