Alternative Databases

IB Syllabus: A3.4.1: Outline the different types of databases as approaches to storing data (NoSQL, cloud, spatial, in-memory).

HL Only. This page covers content assessed at HL level only.

Key Concepts

Why Alternatives to the Relational Model Exist

The relational model is the default for a reason: rich querying, mature tooling, strong ACID guarantees, and decades of operational experience. But it does not fit every workload. Four pressures push organisations to consider alternative database approaches:

Scale. Once a database holds tens of billions of rows or must serve hundreds of thousands of writes per second, the joins and ACID guarantees that make relational databases so reliable become bottlenecks.
Shape of the data. Free-form documents, social graphs, geographic shapes, sensor streams, and machine-learning vectors do not fit neatly into rectangular tables.
Latency. Some applications need sub-millisecond reads (real-time bidding, multiplayer games, fraud-detection scoring) that disk-bound relational databases struggle to deliver.
Operational simplicity. A small team may prefer paying a cloud provider to run the database rather than managing servers themselves.

Four alternative approaches are worth knowing: NoSQL, cloud, spatial, and in-memory databases. None of them replace relational databases for general business systems; each fills a niche the relational model handles poorly.

NoSQL Databases

NoSQL (“Not Only SQL” or “Non-SQL”) is an umbrella term for non-relational databases that emerged in the 2000s to handle web-scale workloads. Instead of one model, NoSQL covers four major families:

Family	Model	Example use
Document (e.g. MongoDB)	Records are self-contained JSON-like documents. Each document can have a different shape.	Product catalogues, content management systems where each item carries varied attributes
Key-value (e.g. Redis used in this mode, DynamoDB)	A simple `key -> value` lookup. Extremely fast, very limited query power.	Session stores, shopping carts, distributed caches
Column-family (e.g. Cassandra, HBase)	Rows are grouped into columns and column families; designed to spread across many machines.	Time-series data, very large log stores, IoT telemetry
Graph (e.g. Neo4j)	Nodes and edges form a graph; queries traverse relationships natively.	Social networks, recommendation engines, fraud detection

Common NoSQL characteristics:

Flexible / schemaless: rows in the same collection can have different fields. Useful when the data shape changes often.
Horizontal scalability: designed to spread across many commodity servers (sharding).
High write throughput: often by relaxing some ACID guarantees in favour of speed.
Eventual consistency instead of strict consistency, updates propagate across replicas with some delay (the BASE model: Basically Available, Soft state, Eventually consistent).
No standard query language: each NoSQL database has its own API.

Trade-offs you accept:

Weaker consistency guarantees can mean stale reads.
Multi-record transactions are limited or absent in many NoSQL systems.
No joins, or only weak joins, denormalisation is the norm.

When NoSQL fits: social media feeds, real-time analytics, IoT sensor streams, content catalogues, shopping carts, workloads where scale and flexibility matter more than relational integrity.

When NoSQL does not fit: financial ledgers, inventory, anything requiring multi-row atomic updates across linked entities, or anything where the data model is naturally tabular and well-understood.

Cloud Databases

A cloud database is a database hosted and managed by a cloud provider, Amazon RDS, Google Cloud SQL, Azure Database, MongoDB Atlas, Snowflake. The underlying database can be relational or NoSQL; the “cloud” classification is about who runs the servers and how you pay for them.

Key features:

Managed service. The provider handles provisioning, patching, backups, replication, monitoring, and failover. The customer focuses on schema and queries.
Elastic scaling. Capacity expands or contracts on demand, pay for what you use this month, scale up next month.
Geographic distribution. Replicas in multiple regions for low-latency reads worldwide and disaster recovery.
Pay-as-you-go pricing. Operational cost, not capital cost; suits startups and variable workloads.
Built-in security. Encryption at rest, encryption in transit, identity-based access, audit logging.

Trade-offs:

Vendor lock-in. Provider-specific APIs and pricing models make moving hard.
Less control. Cannot tune kernel parameters, file systems, or storage layout.
Recurring cost. Cheap to start, can become expensive at scale (especially for egress / network traffic).
Connectivity dependence. A network outage between the application and the cloud database is a hard failure.

When cloud fits: SaaS startups, variable workloads, geographically distributed users, teams without dedicated DBAs. Most new projects in the 2020s start in the cloud by default.

Spatial Databases

A spatial database is optimised for storing and querying geographic and geometric data, points, lines, polygons, distances, areas, intersections. PostGIS (a PostgreSQL extension), Oracle Spatial, and MongoDB’s geospatial features all fit this category.

What spatial data looks like:

Points: a coffee shop’s coordinates (latitude, longitude).
Lines: a road, a river, a route.
Polygons: the outline of a country, a delivery zone, a flood plain.
Multi-geometries: collections of the above.

Spatial-specific operations:

“Find every charging station within 5 km of this point.”
“Does this delivery address fall inside zone B?”
“What is the total length of road in this district?”
“Which buildings are flooded if the river rises 2 metres?”

These queries are mathematically heavy and very slow on a non-spatial database, which can only filter on latitude and longitude as plain numbers. Spatial databases use specialised R-tree indexes that organise space hierarchically, making “what is near this point” queries fast.

Common uses:

GIS applications (Geographic Information Systems)
Emergency-response systems mapping calls to nearest units
Delivery and logistics routing
Mapping and navigation (Google Maps, OpenStreetMap)
Urban planning, environmental monitoring, real estate

In-Memory Databases

An in-memory database (IMDB) stores its working data entirely in RAM rather than on disk. Examples include Redis, Memcached (as a cache), SAP HANA (as a full transactional store), and KDB+ (financial time-series).

Why this is fast: RAM access takes nanoseconds; disk access takes milliseconds. An in-memory database can be 10,000 to 1,000,000 times faster than a disk-bound database for the same query. The DBMS skips most of the I/O overhead that dominates relational performance.

What you give up:

RAM is much more expensive per gigabyte than disk. Working set must fit in memory.
Persistence is harder. Naive RAM-only storage loses data on power loss; production in-memory databases mitigate this with write-ahead logs to disk, snapshots, or distributed replication. None is as durable as a traditional disk-backed database.
Capacity is bounded by the largest single machine’s RAM (or by distributed in-memory clusters, which add complexity).

When in-memory fits:

Caching layers in front of slower databases (Redis is the canonical example).
Real-time analytics: dashboards updating live.
Multiplayer game state.
High-frequency trading: microseconds matter.
Session storage: many short-lived reads and writes.

In practice, many systems use an in-memory cache (Redis) in front of a relational database, getting the speed of in-memory for hot data and the durability and richer query power of relational for cold data.

Comparing the Four Approaches

Approach	Best at	Worst at	Typical scenarios
NoSQL	Horizontal scale, flexible schema, high write throughput	Multi-record atomic transactions, ad-hoc analytical joins	Social media, IoT, real-time analytics, content systems
Cloud	Managed operations, elastic scale, geographic distribution	Vendor lock-in, predictable long-term cost, fine-grained tuning	SaaS apps, variable workloads, distributed users
Spatial	Geographic queries, geometric reasoning	Non-spatial transactional workloads	GIS, mapping, logistics, emergency response
In-memory	Microsecond latency, very high read/write rates	Datasets larger than RAM, strict durability	Caching, real-time analytics, gaming, finance

These are not mutually exclusive: a modern stack might use a relational DB for the system of record, Redis (in-memory) as the cache, a NoSQL document store for user-generated content, a spatial extension for location features, and the whole thing hosted on a cloud platform.

Worked Examples

Example 1: Picking the Right Approach

A start-up is building a food-delivery app and needs to choose technologies for four parts of the system. Which database approach fits each?

Part of the system	Best fit	Why
Order ledger, customer orders, payments, refunds	Relational (with cloud hosting)	Strict consistency and multi-row transactions are essential. The relational model + ACID is the canonical fit.
Real-time driver locations, updated every 5 seconds, queried “all drivers within 2 km”	Spatial (often as a PostGIS extension)	Geographic queries are the dominant workload; an R-tree index makes them fast.
User session state, shopping cart, current preferences	In-memory (Redis)	Short-lived, high-throughput reads and writes; loss of session is annoying but not catastrophic.
User reviews, variable structure, including text, photos, ratings, vendor replies	NoSQL document (MongoDB)	The shape varies per review; a flexible schema avoids constant migrations.

A real production stack would combine these, not pick one. This is the modern norm: choose the right tool per workload, accept some operational complexity in return for better fit.

Example 2: A Spatial Query Described in Words

A city wants to find all fire hydrants within 200 metres of any school. On a spatial database, the query is essentially:

SELECT Hydrant.HydrantID, Hydrant.Location
FROM   Hydrant
WHERE  EXISTS (
         SELECT 1 FROM School
         WHERE  ST_DWithin(Hydrant.Location, School.Boundary, 200)
       );

(ST_DWithin is a PostGIS function meaning “is within distance”.) The spatial index lets the DBMS skip hydrants that are obviously too far away, looking only at the ones in the right neighbourhood, a query that would scan every row on a non-spatial database.

Example 3: Cache Pattern

A news site shows the same homepage to millions of visitors per hour. The article content rarely changes; rendering it from the relational database every time would saturate the database.

The standard fix is an in-memory cache:

Request -> [Cache check]
              |
              if hit  -> return cached version (<1 ms)
              if miss -> query relational DB -> cache result -> return

The relational database handles each article maybe once per minute instead of millions of times per hour. The cache is in-memory because microseconds matter for an HTTP request path.

Quick Check

Q1. Which workload is most strongly suited to a NoSQL database?

Q2. Which type of database is best suited for the query "find all electric vehicle charging stations within 5 kilometres of the user's current location"?

Q3. What is the primary advantage of an in-memory database over a disk-based one?

Q4. Which best describes a cloud database?

Q5. Which statement about choosing among NoSQL, cloud, spatial, and in-memory databases is most accurate?

Match the Approach

For each scenario, choose the best-fit database approach.

Type one of: NoSQL, cloud, spatial, in-memory.

Scenario	Best fit
A multiplayer online game needs to update player positions and inventories thousands of times per second with sub-millisecond latency.
A national mapping agency needs to find all rivers that pass through a given polygon.
A SaaS start-up wants to deploy a relational database without hiring a DBA, scaling automatically with their growing user base.
A social-media platform needs to store user posts with varying structure (text, images, polls, embeds), distributed across many servers, with high write throughput.
A web app needs a fast cache to avoid hitting the slow main database for every page view.

Fill in the Blanks

Complete the descriptions.

FOUR ALTERNATIVE DATABASE APPROACHES
====================================
A  database trades some relational guarantees (joins, strict
consistency) for horizontal scalability, flexible schemas, and high
write throughput. Examples include MongoDB, Cassandra, Redis (as KV).

A  database is hosted and managed by a third-party
provider (AWS, Google Cloud, Azure), offering elastic scaling and
pay-as-you-go pricing.

A  database stores geometric data (points, lines, polygons)
and supports queries like "within 5 km of" or "contains this polygon".

An  database stores its working data in RAM rather than
on disk, enabling microsecond latency for high-throughput workloads.

Spot the Error

A student wrote revision notes about alternative database approaches. One line is wrong. Click the line with the error, then choose the correct fix.

1NoSQL: flexible schema, horizontal scale, often weaker ACID guarantees 2Cloud: managed by a provider, elastic scaling, pay-as-you-go 3Spatial: stores geometric data (points, lines, polygons) and supports geographic queries 4In-memory: stores data on a special type of SSD called 'in-memory storage' 5A modern system often combines several approaches rather than picking one

Pick the correct fix for line 4:

Identify the Approach

An app needs to find all coffee shops within a 5 km radius of the user's current location. Which database approach is best suited for this?

Type the approach name (e.g. NoSQL).

A multiplayer game maintains a live leaderboard that updates thousands of times per second and must be readable in under a millisecond. Which database approach is best suited?

Type the approach name.

Practice Exercises

Core (HL)

[Core] Alternative DBs (HL) [4 marks] Outline what is meant by each of the four alternative database approaches covered on this page: NoSQL, cloud, spatial, and in-memory.
[Core] Alternative DBs (HL) [2 marks] A start-up is building a navigation app that maps real-time vehicle positions onto a road network. Suggest the most appropriate database approach and justify your choice in one sentence.
[Core] Alternative DBs (HL) [3 marks] Describe what is meant by spatial data, and state two geometric primitives a spatial database can store.

Extension (HL)

[Extension] Alternative DBs (HL) [4 marks] Discuss the main trade-offs of choosing a NoSQL database over a relational database for a system that needs to store millions of user posts per day.
[Extension] Alternative DBs (HL) [4 marks] Explain why in-memory databases are dramatically faster than disk-based ones, and identify two trade-offs that come with using one.
[Extension] Alternative DBs (HL) [4 marks] Explain two benefits and two drawbacks of using a cloud database for a small SaaS start-up’s relational data store.

Challenge (HL)

[Challenge] Alternative DBs (HL) [6 marks] A food-delivery start-up needs to handle: (a) orders and payments, (b) real-time driver locations, (c) shopping-cart state, (d) user reviews with varied attached media. Suggest which database approach fits each part of the system and justify each choice.
[Challenge] Alternative DBs (HL) [6 marks] Evaluate whether a long-established financial institution with strict regulatory requirements should migrate its core transactional database from on-premises servers to a major cloud provider. Refer to at least three of: control, cost, latency, compliance, scalability, vendor lock-in.

Note for IB CS learners: The 2027 syllabus introduced a refreshed list of alternative database models (NoSQL, cloud, spatial, in-memory). Older textbooks and questions that reference object-oriented, network, or multi-dimensional databases are out of scope. Expect entirely new questions from 2027 onwards focused on use-case justification rather than recall of features.

Connections

Previous: Transactions and Views, ACID guarantees that NoSQL databases often weaken in exchange for scale.
Related: Relational Database Fundamentals. The limitations listed there (rigid schema, “big data” scale, hierarchical data, impedance mismatch) are exactly the pressures that alternative databases address.
Next (HL): Data Warehouses. Another alternative storage approach, specialised for analytical workloads rather than transactional ones.
Next (HL): Distributed Databases. How databases spread across many machines, building on cloud and NoSQL ideas.
Hardware: Primary Memory and Secondary Storage. The cost and speed difference between RAM and disk that defines the in-memory database trade-off.
Hardware: Cloud Computing, SaaS / PaaS / IaaS framing for what “cloud database” actually means operationally.