gavi's picture
From gavi rss RSS  subscribe Subscribe

Scalable Web Architectures 



Scalable Web Architectures

 

 
 
Tags:  Scalable  Web  Architectures 
Views:  2108
Downloads:  20
Published:  July 02, 2007
 
2
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
Scaling Oracle8i(TM): Building Highly Scalable OLTP System Architectures

Scaling Oracle8i(TM): Building Highly Scalable OLTP System Architectures

From: anon-389456
Views: 183 Comments: 0
Scaling Oracle8i(TM): Building Highly Scalable OLTP System Architectures ,atlanta fulton country library, library vocabulary, burnham library south bucks, norman ok library
 
Cloud Computing - Avner Algom IGT

Cloud Computing - Avner Algom IGT

From: bernstra
Views: 822 Comments: 0

 
Alfresco WCM For High Scalability

Alfresco WCM For High Scalability

From: jinsok8
Views: 291 Comments: 0
Alfresco WCM For High Scalability
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conference 2011

Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conference 2011

From: agustyau
Views: 17 Comments: 0

 
Enterprise JavaBeans™ (EJB™) Architecture Scalability/Load Test

Enterprise JavaBeans™ (EJB™) Architecture Scalability/Load Test

From: anon-540205
Views: 16 Comments: 0
Enterprise JavaBeans™ (EJB™) Architecture Scalability/Load Test
 
Event-Driven Service-oriented Architecture (EDSOA)

Event-Driven Service-oriented Architecture (EDSOA)

From: attuneinfocom
Views: 1568 Comments: 0
We have identified issues related to composition of a business
process and discussed the requirements for event-driven composition
and event-driven service-oriented architecture. We have implemented (more)

 
See all 
 
More from this user
Microsoft Office Business Scorecard Manager 2005

Microsoft Office Business Scorecard Manager 2005

From: gavi
Views: 3183
Comments: 0

Google Earth

Google Earth

From: gavi
Views: 4477
Comments: 1

Comparing J2EE with .NET

Comparing J2EE with .NET

From: gavi
Views: 3658
Comments: 0

flash

flash

From: gavi
Views: 1789
Comments: 0

Evolution Of Soa - Gartner

Evolution Of Soa - Gartner

From: gavi
Views: 3666
Comments: 0

 
See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: Scalable Web Architectures Common Patterns & Approaches Cal Henderson
Slide 2: Hello SAM-SIG, 23rd August 2006 2
Slide 3: Scalable Web Architectures? What does scalable mean? What’s an architecture? SAM-SIG, 23rd August 2006 3
Slide 4: Scalability – myths and lies • What is scalability? SAM-SIG, 23rd August 2006 4
Slide 5: Scalability – myths and lies • What is scalability not ? SAM-SIG, 23rd August 2006 5
Slide 6: Scalability – myths and lies • What is scalability not ? – Raw Speed / Performance – HA / BCP – Technology X – Protocol Y SAM-SIG, 23rd August 2006 6
Slide 7: Scalability – myths and lies • So what is scalability? SAM-SIG, 23rd August 2006 7
Slide 8: Scalability – myths and lies • So what is scalability? – Traffic growth – Dataset growth – Maintainability SAM-SIG, 23rd August 2006 8
Slide 9: Scalability • Two kinds: – Vertical (get bigger) – Horizontal (get more) SAM-SIG, 23rd August 2006 9
Slide 10: Big Irons Sunfire E20k 36x 1.8GHz processors $450,000 - $2,500,000 SAM-SIG, 23rd August 2006 PowerEdge SC1425 2.8 GHz processor Under $1,500 10
Slide 11: Cost vs Cost SAM-SIG, 23rd August 2006 11
Slide 12: Cost vs Cost • But sometimes vertical scaling is right • Buying a bigger box is quick (ish) • Redesigning software is not • Running out of MySQL performance? – Spend months on data federation – Or, Just buy a ton more RAM SAM-SIG, 23rd August 2006 12
Slide 13: Cost vs Cost • But let’s talk horizontal – Else this is going to be boring SAM-SIG, 23rd August 2006 13
Slide 14: Architectures then? • The way the bits fit together • What grows where • The trade offs between good/fast/cheap SAM-SIG, 23rd August 2006 14
Slide 15: LAMP • We’re talking about LAMP – Linux – Apache (or LightHTTPd) – MySQL (or Postgres) – PHP (or Perl, Python, Ruby) • All open source • All well supported • All used in large operations SAM-SIG, 23rd August 2006 15
Slide 16: Simple web apps • A Web Application – Or “Web Site” in Web 1.0 terminology Interwebnet App server Database SAM-SIG, 23rd August 2006 16
Slide 17: App servers • App servers scale in two ways: SAM-SIG, 23rd August 2006 17
Slide 18: App servers • App servers scale in two ways: – Really well SAM-SIG, 23rd August 2006 18
Slide 19: App servers • App servers scale in two ways: – Really well – Quite badly SAM-SIG, 23rd August 2006 19
Slide 20: App servers • Sessions! – (State) – Local sessions == bad • When they move == quite bad – Central sessions == good – No sessions at all == awesome! SAM-SIG, 23rd August 2006 20
Slide 21: Local sessions • Stored on disk – PHP sessions • Stored in memory – Shared memory block • Bad! – Can’t move users – Can’t avoid hotspots SAM-SIG, 23rd August 2006 21
Slide 22: Mobile local sessions • Custom built – Store last session location in cookie – If we hit a different server, pull our session information across • If your load balancer has sticky sessions, you can still get hotspots – Depends on volume – fewer heavier users hurt more SAM-SIG, 23rd August 2006 22
Slide 23: Remote centralized sessions • Store in a central database – Or an in-memory cache • No porting around of session data • No need for sticky sessions • No hot spots • Need to be able to scale the data store – But we’ve pushed the issue down the stack SAM-SIG, 23rd August 2006 23
Slide 24: No sessions • Stash it all in a cookie! • Sign it for safety – $data = $user_id . ‘-’ . $user_name; – $time = time(); – $sig = sha1($secret . $time . $data); – $cookie = base64(“$sig-$time-$data”); – Timestamp means it’s simple to expire it SAM-SIG, 23rd August 2006 24
Slide 25: Super slim sessions • If you need more than the cookie (login status, user id, username), then pull their account row from the DB – Or from the account cache • None of the drawbacks of sessions • Avoids the overhead of a query per page – Great for high-volume pages which need little personalization – Turns out you can stick quite a lot in a cookie too – Pack with base64 and it’s easy to delimit fields SAM-SIG, 23rd August 2006 25
Slide 26: App servers • The Rasmus way – App server has ‘shared nothing’ – Responsibility pushed down the stack • Ooh, the stack SAM-SIG, 23rd August 2006 26
Slide 27: Trifle SAM-SIG, 23rd August 2006 27
Slide 28: Trifle Fruit / Presentation Cream / Markup Custard / Page Logic Jelly / Business Logic Sponge / Database SAM-SIG, 23rd August 2006 28
Slide 29: Trifle Fruit / Presentation Cream / Markup Custard / Page Logic Jelly / Business Logic Sponge / Database SAM-SIG, 23rd August 2006 29
Slide 30: App servers SAM-SIG, 23rd August 2006 30
Slide 31: App servers SAM-SIG, 23rd August 2006 31
Slide 32: App servers SAM-SIG, 23rd August 2006 32
Slide 33: Well, that was easy • Scaling the web app server part is easy • The rest is the trickier part – Database – Serving static content – Storing static content SAM-SIG, 23rd August 2006 33
Slide 34: The others • Other services scale similarly to web apps – That is, horizontally • The canonical examples: – Image conversion – Audio transcoding – Video transcoding – Web crawling SAM-SIG, 23rd August 2006 34
Slide 35: Parallelizable == easy! • If we can transcode/crawl in parallel, it’s easy – But think about queuing – And asynchronous systems – The web ain’t built for slow things – But still, a simple problem SAM-SIG, 23rd August 2006 35
Slide 36: Asynchronous systems SAM-SIG, 23rd August 2006 36
Slide 37: Asynchronous systems SAM-SIG, 23rd August 2006 37
Slide 38: Helps with peak periods SAM-SIG, 23rd August 2006 38
Slide 39: Asynchronous systems SAM-SIG, 23rd August 2006 39
Slide 40: Asynchronous systems SAM-SIG, 23rd August 2006 40
Slide 41: Asynchronous systems SAM-SIG, 23rd August 2006 41
Slide 42: The big three • Let’s talk about the big three then… – Databases – Serving lots of static content – Storing lots of static content SAM-SIG, 23rd August 2006 42
Slide 43: Databases • Unless we’re doing a lot of file serving, the database is the toughest part to scale • If we can, best to avoid the issue altogether and just buy bigger hardware • Dual Opteron/Intel64 systems with 16GB of RAM can get you a long way SAM-SIG, 23rd August 2006 43
Slide 44: More read power • Web apps typically have a read/write ratio of somewhere between 80/20 and 90/10 • If we can scale read capacity, we can solve a lot of situations • MySQL replication! SAM-SIG, 23rd August 2006 44
Slide 45: Master-Slave Replication SAM-SIG, 23rd August 2006 45
Slide 46: Master-Slave Replication Reads and Writes Reads SAM-SIG, 23rd August 2006 46
Slide 47: Master-Slave Replication SAM-SIG, 23rd August 2006 47
Slide 48: Master-Slave Replication SAM-SIG, 23rd August 2006 48
Slide 49: Master-Slave Replication SAM-SIG, 23rd August 2006 49
Slide 50: Master-Slave Replication SAM-SIG, 23rd August 2006 50
Slide 51: Master-Slave Replication SAM-SIG, 23rd August 2006 51
Slide 52: Master-Slave Replication SAM-SIG, 23rd August 2006 52
Slide 53: Master-Slave Replication SAM-SIG, 23rd August 2006 53
Slide 54: Master-Slave Replication SAM-SIG, 23rd August 2006 54
Slide 55: Caching • Caching avoids needing to scale! – Or makes it cheaper • Simple stuff – mod_perl / shared memory – dumb – MySQL query cache - dumbish SAM-SIG, 23rd August 2006 55
Slide 56: Caching • Getting more complicated… – Write-through cache – Write-back cache – Sideline cache SAM-SIG, 23rd August 2006 56
Slide 57: Write-through cache SAM-SIG, 23rd August 2006 57
Slide 58: Write-back cache SAM-SIG, 23rd August 2006 58
Slide 59: Sideline cache SAM-SIG, 23rd August 2006 59
Slide 60: Sideline cache • Easy to implement – Just add app logic • Need to manually invalidate cache – Well designed code makes it easy • Memcached – From Danga (LiveJournal) – http://www.danga.com/memcached/ SAM-SIG, 23rd August 2006 60
Slide 61: But what about HA? SAM-SIG, 23rd August 2006 61
Slide 62: But what about HA? SAM-SIG, 23rd August 2006 62
Slide 63: SPOF! • The key to HA is avoiding SPOFs – Identify – Eliminate • Some stuff is hard to solve – Fix it further up the tree • Dual DCs solves Router/Switch SPOF SAM-SIG, 23rd August 2006 63
Slide 64: Master-Master SAM-SIG, 23rd August 2006 64
Slide 65: Master-Master • Either hot/warm or hot/hot • Writes can go to either – But avoid collisions – No auto-inc columns for hot/hot • Bad for hot/warm too – Design schema/access to avoid collisions • Hashing users to servers SAM-SIG, 23rd August 2006 65
Slide 66: Rings • Master-master is just a small ring – With 2 members • Bigger rings are possible – But not a mesh! – Each slave may only have a single master – Unless you build some kind of manual replication SAM-SIG, 23rd August 2006 66
Slide 67: Rings SAM-SIG, 23rd August 2006 67
Slide 68: Rings SAM-SIG, 23rd August 2006 68
Slide 69: Dual trees • Master-master is good for HA – But we can’t scale out the reads • We often need to combine the read scaling with HA • We can combine the two SAM-SIG, 23rd August 2006 69
Slide 70: Dual trees SAM-SIG, 23rd August 2006 70
Slide 71: Data federation • At some point, you need more writes – This is tough – Each cluster of servers has limited write capacity • Just add more clusters! SAM-SIG, 23rd August 2006 71
Slide 72: Data federation • Split up large tables, organized by some primary object – Usually users • Put all of a user’s data on one ‘cluster’ – Or shard, or cell • Have one central cluster for lookups SAM-SIG, 23rd August 2006 72
Slide 73: Data federation SAM-SIG, 23rd August 2006 73
Slide 74: Data federation • Need more capacity? – Just add shards! – Don’t assign to shards based on user_id! • For resource leveling as time goes on, we want to be able to move objects between shards – ‘Lockable’ objects SAM-SIG, 23rd August 2006 74
Slide 75: Data federation • Heterogeneous hardware is fine – Just give a larger/smaller proportion of objects depending on hardware • Bigger/faster hardware for paying users – A common approach SAM-SIG, 23rd August 2006 75
Slide 76: Downsides • Need to keep stuff in the right place • App logic gets more complicated • More clusters to manage – Backups, etc • More database connections needed per page • The dual table issue – Avoid walking the shards! SAM-SIG, 23rd August 2006 76
Slide 77: Bottom line Data federation is how large applications are scaled SAM-SIG, 23rd August 2006 77
Slide 78: Bottom line • It’s hard, but not impossible • Good software design makes it easier – Abstraction! • Master-master pairs for shards give us HA • Master-master trees work for central cluster (many reads, few writes) SAM-SIG, 23rd August 2006 78
Slide 79: Multiple Datacenters • Having multiple datacenters is hard – Not just with MySQL • Hot/warm with MySQL slaved setup – But manual • Hot/hot with master-master – But dangerous • Hot/hot with sync/async manual replication – But tough SAM-SIG, 23rd August 2006 79
Slide 80: Multiple Datacenters SAM-SIG, 23rd August 2006 80
Slide 81: Serving lots of files • Serving lots of files is not too tough – Just buy lots of machines and load balance! • We’re IO bound – need more spindles! – But keeping many copies of data in sync is hard – And sometimes we have other per-request overhead (like auth) SAM-SIG, 23rd August 2006 81
Slide 82: Reverse proxy SAM-SIG, 23rd August 2006 82
Slide 83: Reverse proxy • Serving out of memory is fast! – And our caching proxies can have disks too – Fast or otherwise • More spindles is better • We stay in sync automatically • We can parallelize it! – 50 cache servers gives us 50 times the serving rate of the origin server – Assuming the working set is small enough to fit in memory in the cache cluster SAM-SIG, 23rd August 2006 83
Slide 84: Invalidation • Dealing with invalidation is tricky • We can prod the cache servers directly to clear stuff out – Scales badly – need to clear asset from every server – doesn’t work well for 100 caches SAM-SIG, 23rd August 2006 84
Slide 85: Invalidation • We can change the URLs of modified resources – And let the old ones drop out cache naturally – Or prod them out, for sensitive data • Good approach! – Avoids browser cache staleness – Hello akamai (and other CDNs) – Read more: • http://www.thinkvitamin.com/features/webapps/serving-javascript-fast SAM-SIG, 23rd August 2006 85
Slide 86: Reverse proxy • Choices – L7 load balancer & Squid • http://www.squid-cache.org/ – mod_proxy & mod_cache • http://www.apache.org/ – Perlbal and Memcache? • http://www.danga.com/ SAM-SIG, 23rd August 2006 86
Slide 87: High overhead serving • What if you need to authenticate your asset serving – Private photos – Private data – Subscriber-only files • Two main approaches SAM-SIG, 23rd August 2006 87
Slide 88: Perlbal backhanding • Perlbal can do redirection magic – Backend server sends header to Perbal – Perlbal goes to pick up the file from elsewhere – Transparent to user SAM-SIG, 23rd August 2006 88
Slide 89: Perlbal backhanding SAM-SIG, 23rd August 2006 89
Slide 90: Perlbal backhanding • Doesn’t keep database around while serving • Doesn’t keep app server around while serving • User doesn’t find out how to access asset directly SAM-SIG, 23rd August 2006 90
Slide 91: Permission URLs • But why bother!? • If we bake the auth into the URL then it saves the auth step • We can do the auth on the web app servers when creating HTML • Just need some magic to translate to paths • We don’t want paths to be guessable SAM-SIG, 23rd August 2006 91
Slide 92: Permission URLs SAM-SIG, 23rd August 2006 92
Slide 93: Storing lots of files • Storing files is easy! – Get a big disk – Get a bigger disk – Uh oh! • Horizontal scaling is the key – Again SAM-SIG, 23rd August 2006 93
Slide 94: Connecting to storage • NFS – Stateful == Sucks – Hard mounts vs Soft mounts • SMB / CIFS / Samba – Turn off MSRPC & WINS (NetBOIS NS) – Stateful but degrades gracefully • HTTP – Stateless == yay! – Just use Apache SAM-SIG, 23rd August 2006 94
Slide 95: Multiple volumes • Volumes are limited in total size – Except under ZFS & others • Sometimes we need multiple volumes for performance reasons – When use RAID with single/dual parity • At some point, we need multiple volumes SAM-SIG, 23rd August 2006 95
Slide 96: Multiple volumes SAM-SIG, 23rd August 2006 96
Slide 97: Multiple hosts • Further down the road, a single host will be too small • Total throughput of machine becomes an issue • Even physical space can start to matter • So we need to be able to use multiple hosts SAM-SIG, 23rd August 2006 97
Slide 98: Multiple hosts SAM-SIG, 23rd August 2006 98
Slide 99: HA Storage • HA is important for assets too – We can back stuff up – But we want it hot redundant • RAID is good – RAID5 is cheap, RAID 10 is fast SAM-SIG, 23rd August 2006 99
Slide 100: HA Storage • But whole machines can fail • So we stick assets on multiple machines • In this case, we can ignore RAID – In failure case, we serve from alternative source – But need to weigh up the rebuild time and effort against the risk – Store more than 2 copies? SAM-SIG, 23rd August 2006 100
Slide 101: HA Storage SAM-SIG, 23rd August 2006 101
Slide 102: Self repairing systems • When something fails, repairing can be a pain – RAID rebuilds by itself, but machine replication doesn’t • The big appliances self heal – NetApp, StorEdge, etc • So does MogileFS SAM-SIG, 23rd August 2006 102
Slide 103: Real world examples • Flickr – Because I know it • LiveJournal – Because everyone copies it SAM-SIG, 23rd August 2006 103
Slide 104: Flickr Architecture SAM-SIG, 23rd August 2006 104
Slide 105: LiveJournal Architecture SAM-SIG, 23rd August 2006 105
Slide 106: Buy my book! SAM-SIG, 23rd August 2006 106
Slide 107: The end! SAM-SIG, 23rd August 2006 107
Slide 108: Awesome! These slides are available online: iamcal.com/talks/ SAM-SIG, 23rd August 2006 108

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location