One post tagged with "Distributed Database"

Spanner - Google solution for distributed database in easier

December 29, 2025 · 5 min read

Self Learner

Overview: Spanner is Google's scalable, globally-distributed database that combines the benefits of traditional relational databases with massive scalability across datacenters worldwide.

Key Features:

Global Distribution & Replication
- Shards data across multiple datacenters globally using Paxos consensus
- Automatic failover and resharding
- Configurable replication (typically 3-5 replicas)
TrueTime API
- Novel time API that explicitly exposes clock uncertainty
- Uses GPS and atomic clocks for synchronization
- Typical uncertainty (ε) is 1-7ms
- Enables strong consistency guarantees
Transaction Support
- External consistency (linearizability) for distributed transactions
- Lock-free snapshot transactions
- Non-blocking reads in the past
- Two-phase commit across Paxos groups
Data Model
- Semi-relational tables with SQL-like query language
- Hierarchical schema with directories as units of data placement
- Versioned key-value storage with automatic timestamps

Performance Highlights:

Write latency: ~15ms (including commit wait)
Snapshot read latency: ~1.4ms
Successfully handles datacenter failures with minimal disruption
First major customer (F1 - Google's advertising backend) migrated from sharded MySQL

Significance: First system to provide external consistency and globally-consistent reads at global scale, making it suitable for applications requiring strong consistency across continents.

1. The Basic Problem Spanner Solves

Imagine you have a banking app used worldwide. You need:

Data everywhere: Users in Japan and USA both need fast access
Consistency: If someone transfers money, both accounts must update correctly
Availability: System must work even if a datacenter fails

Traditional databases struggle with this. Spanner solves it.

2. Core Concepts

A. Data Distribution (Sharding)

Think of your data like a library:

Instead of one giant building (single server), you have multiple branches (servers) worldwide
Each branch holds certain books (data shards)
Users go to their nearest branch for faster service

User Data Split Across Servers:
Server 1 (USA): Users A-M
Server 2 (Europe): Users N-Z
Server 3 (Asia): Backup copies

B. Replication (Copies)

Each piece of data exists in multiple locations (typically 3-5 copies)
If one datacenter fails, others keep working
Like having backup copies of important documents in different safe deposit boxes

C. Paxos Consensus

This is how servers agree on what data is correct:

Simple Example:
Server A wants to write "Balance = $100"
It asks other servers: "Can I write this?"
Majority (3 out of 5) must agree
Once majority agrees, write is committed
All servers eventually get updated

3. The TrueTime Innovation

This is Spanner's "secret sauce" for handling time across the globe.

The Time Problem:

Server in USA: "Transaction happened at 10:00:00"
Server in Japan: "Transaction happened at 10:00:01"
But which really came first? Clocks aren't perfectly synchronized!

TrueTime Solution:

Instead of saying "it's exactly 10:00:00", TrueTime says:

"It's between 10:00:00.003 and 10:00:00.007"
(uncertainty range)

How it works:

Uses GPS satellites and atomic clocks
Admits uncertainty explicitly
Waits out uncertainty when needed for correctness

Example:
Transaction 1 commits at time [10:00:00.000 - 10:00:00.007]
Spanner waits until 10:00:00.007 passes
Then Transaction 2 can start at [10:00:00.008 - 10:00:00.015]
Now we KNOW Transaction 1 finished before Transaction 2 started!

4. Transaction Types

Read-Write Transactions (Strong Consistency)

Example: Transfer $50 from Account A to Account B

Lock both accounts
Read current balances
Calculate new balances
Get timestamp from TrueTime
Write changes to multiple servers (Paxos)
Wait for commit-wait (ensures ordering)
Release locks

Snapshot Transactions (Read-Only, No Locks)

Example: Generate monthly report

Pick a timestamp in the past (e.g., "end of last month")
Read data as it existed at that exact time
No locks needed - just reading history
Can read from any replica that's up-to-date

5. Data Organization

Directories

Think of directories as folders that keep related data together:

Directory for User "Alice":
├── Profile Info
├── Purchase History
└── Preferences

This whole directory:
- Stays together on same servers
- Moves as a unit if needed
- Can be replicated to different locations

Schema Example

-- Users table
CREATE TABLE Users {
  user_id INT64,
  email STRING,
  name STRING
} PRIMARY KEY (user_id);

-- Orders table (nested under Users)
CREATE TABLE Orders {
  user_id INT64,
  order_id INT64,
  amount FLOAT
} PRIMARY KEY (user_id, order_id),
  INTERLEAVE IN PARENT Users;

The INTERLEAVE means orders are stored physically close to their user - faster queries!

6. How a Write Works (Step-by-Step)

User wants to update their email:

REQUEST arrives at nearest Spanner server
   
SERVER identifies which Paxos group owns this data
   
LEADER of that group receives the request
   
ACQUIRE LOCKS on the data
   
PROPOSE change to other replicas via Paxos
   "I want to write: user_123.email = '[email protected]'"
   
REPLICAS vote (need majority to agree)
   
TIMESTAMP assigned using TrueTime
   timestamp = TT.now().latest
   
COMMIT WAIT - wait until TT.after(timestamp) is true
   (ensures external consistency)
   
APPLY the change to all replicas
   
RELEASE LOCKS
    
RETURN success to user

7. Key Benefits

Feature	What It Means
External Consistency	Transactions appear in the order they actually happened in real time
Global Availability	Works even if entire datacenters fail
SQL Support	Use familiar database queries
Automatic Scaling	Add servers without downtime
Strong Consistency	No "eventual consistency" surprises

8. Real-World Example: Google's F1

Google's advertising system uses Spanner:

Before: Manually sharded MySQL (nightmare to manage)
After: Spanner handles sharding automatically
Scale: Tens of terabytes, millions of operations/second
Replication: 5 replicas across US (2 west coast, 3 east coast)

1. The Basic Problem Spanner Solves​

2. Core Concepts​

A. Data Distribution (Sharding)​

B. Replication (Copies)​

C. Paxos Consensus​

3. The TrueTime Innovation​

The Time Problem:​

TrueTime Solution:​

4. Transaction Types​

Read-Write Transactions (Strong Consistency)​

Snapshot Transactions (Read-Only, No Locks)​

5. Data Organization​

Directories​

Schema Example​

6. How a Write Works (Step-by-Step)​

7. Key Benefits​

8. Real-World Example: Google's F1​