Slide 1: Scaling with MongoDB
Eliot Horowitz @eliothorowitz MongoUK March 21, 2011
Slide 2: Scaling
• Storage needs only go up • Operations/sec only go up • Complexity only goes up
Slide 3: Horizontal Scaling
• Vertical scaling is limited • Hard to scale vertically in the cloud • Can scale wider than higher
Slide 4: Read Scaling
• One master at any time • Programmer determines if read hits master
or a slave well
• Pro: easy to setup, can scale reads very • Con: reads are inconsistent on a slave • Writes don’t scale
Slide 5: One Master, Many Slaves
• Custom Master/Slave setup • Have as many slaves as you want • Can put them local to application servers • Good for 90+% read heavy applications
(Wikipedia)
Slide 6: Replica Sets
• High Availability Cluster • One master at any time, up to 6 slaves • A slave automatically promoted to master
if failure
• Drivers support auto routing of reads to
slaves if programmer allows
• Good for applications that need high write
availability but mostly reads (Commenting System)
Slide 7: Sharding
• Many masters, even more slaves • Can scale in two dimensions • Add Shards for write and data size scaling • Add slaves for inconsistent read scaling and
redundancy
Slide 8: Sharding Basics
• Data is split up into chunks • Shard: Replica sets that hold a portion of
the data system
• Config Servers: Store meta data about • Mongos: Routers, direct direct and merge
requests
Slide 9: Architecture
Shards
mongodd dd mongod
Config Servers mongod mongod mongod client client client client mongos mongos ...
mongodd dd mongod mongod
mongodd dd mongod mongod
...
mongod
Slide 10: Common Setup
• A common setup is 3 shards with 3 servers
per shard: 3 masters, 6 slaves
• Can add sharding later to an existing
replica set with no down time collections
• Can have sharded and non-sharded
Slide 11: Range Based
MIN A F M R MAX F M R Z LOCATION shard1 shard1 shard2 shard3
• collection is broken into chunks by range • chunks default to 64mb or 100,000 objects
Slide 12: Config Servers
• 3 of them • changes are made with 2 phase commit • if any are down, meta data goes read only • system is online as long as 1/3 is up
Slide 13: mongos
• Sharding Router • Acts just like a mongod to clients • Can have 1 or as many as you want • Can run on appserver so no extra network
traffic
• Cache meta data from config servers
Slide 14: Writes
• Inserts : require shard key, routed • Removes: routed and/or scattered • Updates: routed or scattered
Slide 15: Queries
• By shard key: routed • sorted by shard key: routed in order • by non shard key: scatter gather • sorted by non shard key: distributed merge
sort
Slide 16: Splitting
• Take a chunk and split it in 2 • Splits on the median value • Splits only change meta data, no data
change
Slide 17: Splitting
T1 T2
MIN A MAX Z LOCATION shard1 MIN A G MAX G Z LOCATION shard1 shard1
T3
MIN A D G S
MAX D G S Z
LOCATION shard1 shard1 shard1 shard1
Slide 18: Balancing
• Moves chunks from one shard to another • Done online while system is running • Balancing runs in the background
Slide 19: Migrating
T3
MIN A D G S MAX D G S Z MAX D G S Z MAX D G S Z LOCATION shard1 shard1 shard1 shard1 LOCATION shard1 shard1 shard1 shard2 LOCATION shard1 shard1 shard2 shard2
T4
MIN A D G
T5
S MIN A D G S
Slide 20: Choosing a Shard Key
• Shard key determines how data is
partitioned
• Hard to change • Most important performance decision
Slide 21: Use Case: User Profiles
{ email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] }
•Shard by email •Lookup by email hits 1 node •Index on { “addresses.state” : 1 }
Slide 22: Use Case: Activity Stream
{ user_id : XXX, event_id : YYY , data : ZZZ }
•Shard by user_id •Looking up an activity stream hits 1 node •Writing even is distributed •Index on { “event_id” : 1 } for deletes
Slide 23: Use Case: Photos
{ photo_id : ???? , data : <binary> } What’s the right key?
•auto increment •MD5( data ) •now() + MD5(data) •month() + MD5(data)
Slide 24: Use Case: Logging
{ machine : “app.foo.com” , app : “apache” , when : “2010-12-02:11:33:14” , data : XXX }
Possible Shard keys
•{ machine : 1 } •{ when : 1 } •{ machine : 1 , app : 1 } •{ app : 1 }
Slide 25: Download MongoDB
http://www.mongodb.org
and let us know what you think @eliothorowitz @mongodb
10gen is hiring! http://www.10gen.com/jobs