anon-525505's picture
From anon-525505 rss RSS 

10 rulesforscalabledatastoreperformanc

10 rulesforscalabledatastoreperformance

 

 
 
Tags:  NoSQL  sql  scalability  j2ee  nosql  memcached 
Views:  65
Published:  November 15, 2011
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
No related plicks found
 
More from this user
No more plicks from this user
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: 10 Rules for Scalable Data Store Performance Rick Cattell, PhD, Cattell.Net Consulting September 29, 2010
Slide 2: What You’ll Learn Today • Why data store applications are traditionally hard to scale • Pros & cons of scalable data store alternatives • NoSQL, Distributed Caches, Sharding, Scalable SQL,… • 10 rules for building scalable data store apps • Q&A March 3, 2009 | 2
Slide 3: 10 Rules for Scalable Datastore Performance Rick Cattell, PhD Cattell.Net Consulting Webinar: September 29, 2010
Slide 4: Focus for this talk • • • • Horizontal scalability distributes the total load of data storage and retrieval over many servers - Not vertical scalability on a single multi-core server - Not single-node performance (but this matters too) Transactional applications: reading and writing one record or a few related records - Not complex queries/transactions involving lots of data - Not data warehouses (scalable products exist for read-mostly) Traditional RDBMSs proved inadequate for web companies: a single server could not service thousands or millions of simultaneous users New scalable systems can scale linearly to tens or hundreds of servers, for transactional (OLTP) applications © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 4
Slide 5: Typical scenario • Web start-up (or web project in larger company) • • stores application data in RDBMS The RDBMS becomes saturated as the application scales, even going to biggest servers The company then tries to fix the scalability problems with various techniques - Distributed caches - Replication to split load and provide failover - Sharding: application splits data/load into multiple databases according to some application key 10 Rules for Scalable Datastore Performance © Rick Cattell, Sept 2010 Slide 5
Slide 6: Problems with each strategy • Distributed caches: multiple servers • Database replication - Cache is out-of-date and has limited functionality - Read load is distributed, but writes must now go to - Application layer must handle sharding, re-sharding, server failures, SQL operations and transactions that span servers, etc. • Sharding data over many servers • It’s much better if the datastore handles all this! © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance - Application programmers not writing database code - Knowledge of sharding key/etc not in application layer Slide 6
Slide 7: New systems to solve the problem • “NoSQL” datastores - Abandon SQL for a much simpler data model, with • Next-generation scalable RDBMSs operations at the level of keys and records - Abandon ACID for a simpler concurrency model based on “eventual consistency”, multiple versions, “quorum” reads/writes, or “locally” ACID single-record operations - Inspired by Amazon’s Dynamo and Google’s BigTable - Maintain traditional SQL and ACID of RDBMS, provide linear scalability when operations do not span servers - Optimized for OLTP, greatly improve on existing shared-nothing and shared-disk RDBMSs © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 7
Slide 8: Examples of new scalable datastores • Key-value stores: index “blobs” by a key variants - Dynamo, Voldemort, Riak, Scalaris, memcached - MongoDB, CouchDB, SimpleDB • Document stores: index a collection of objects on multiple attributes, with simple search predicates tables partitioned horizontally and vertically • Extensible record stores: variable-width sparse • Scalable relational DBMS: linear scale for OLTP - VoltDB, MySQL Cluster, Clustrix © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 8 - BigTable, PNUTs, Cassandra, HBase, HyperTable
Slide 9: 10 Rules: What you should look for • Result of input from many sources • Major trade-offs that affect you • Your choice depends on your needs © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 9
Slide 10: Rule 1: Look for shared-nothing scalability for transactional workloads • Shared-disk does not scale well beyond 10 nodes • Data warehouses are shared-nothing, but only scale for read-only or read-mostly Teradata - Oracle RAC - Greenplum, Vertica, Asterdata, Paraccel, Netezza, © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 10
Slide 11: Rule 2: High-level languages are good and need not hurt performance • Good SQL compilers create code equivalent to • • • • competent access-method programmers Joins can be expensive, but can also be avoided Complex queries are not needed in transactional applications Stored procedures avoid the overhead of “conversational” SQL over ODBC/JDBC SQL provides important value: ease of use, logical and physical data independence, etc. 10 Rules for Scalable Datastore Performance Slide 11 © Rick Cattell, Sept 2010
Slide 12: Rule 3: Leverage fast memory • RAM is getting cheaper, and RAM distributed • • • over many servers can store a lot of data Flash memory also changes the equation: near RAM speeds for reads at much lower cost A transactional update log can go to disk, to flash, or to a mirror node But: traditional RDBMSs were designed for data on disk, a completely different RDBMS implementation is needed! 10 Rules for Scalable Datastore Performance Slide 12 © Rick Cattell, Sept 2010
Slide 13: Enlightening example: take traditional RDBMS, replace the disk with RAM 11% 13% Useful work 20% 33% Locks, deadlocks Logging Buffer pool 23% MT latching CPU Time (get an 8X speedup, with the right design!) © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 13
Slide 14: Rule 4: High availability and automatic recovery become essential with scalability • Consider your MTBF with a single server • Now think: what happens if I split my database • • over 10 servers? Or 100 servers? Then think: what happens if a failure requires manual intervention and database recovery? Scalable datastores must unambiguously and automatically detect node failures, communicate the failure to all nodes, divert operations to replica node(s), restart or replace the failed node, replay lost updates, and re-synchronize 10 Rules for Scalable Datastore Performance Slide 14 © Rick Cattell, Sept 2010
Slide 15: Rule 5: On-line everything • Can I bring down 10 servers to change the data • • schema or indexes, add or remove server nodes, or upgrade database software? What about 100 or 1000 servers? Even if you ran on one server, your business may require continuous operation You will likely want your scalable datastore to handle 95% of these scenarios © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 15
Slide 16: Rule 6: Avoid multi-node operations • • • • This is a rule for your application as well as the datastore Transaction coordination, joins, and complex queries over many nodes affects scalability - No one has figured out how to avoid this - NoSQL systems essentially disallow these operations Your choice of partitioning key is essential to avoiding cross-node operations Think twice about whether you require ACID transactions spanning nodes © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 16
Slide 17: Rule 7: Don’t try to build ACID yourself • It is a nightmare building atomicity, consistency, • isolation, and durability on top of a system that lacks these: if you need ACID, use a DBMS that provides it NoSQL systems provide different, more limited mechanisms that might or might not meet your needs: multi-versioning, single-object ACID, “quorum” reads and writes, and/or an “update if current” operation © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 17
Slide 18: Rule 8: Administrative simplicity • Consider the complexity of administering single• • server DBMSs: installation, tuning, schema construction/change, monitoring, backups, etc Now consider 10 or 100 servers that must be administered, coordinated, and kept online together Try it out first! © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 18
Slide 19: Rule 9: Per-node performance matters • After all this discussion of horizontal scaling, - Even more, with sub-linear scaling don’t forget that 10X single-node performance makes the difference between needing 10 servers or 100 servers shows a 10X range application • Node performance on current systems easily - There may be even greater differences on your • Consider the total cost: hardware, administration, floor space, power consumption © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 19
Slide 20: Rule 10: Consider benefits of open source [This is not a technical point, but it is important] • You are not “trapped” by a vendor that can raise • • • prices You are not trapped by a vendor that may go belly-up Your upfront costs are minimal, and you can seek other sources for support You benefit from other contributors © Rick Cattell, Sept 2010 10 Rules for Scalable Datastore Performance Slide 20
Slide 21: References All of these available on http://cattell.net/datastores : • References to all the NoSQL and SQL systems • Other web sites and blogs on database scaling • Cattell, “High Performance Scalable Datastores”, • April 2010 Stonebraker and Cattell, “Ten Rules for Scalable Performance in Simple Operation Datastores”, to appear, CACM 10 Rules for Scalable Datastore Performance Slide 21 © Rick Cattell, Sept 2010
Slide 22: VoltDB: Scalable Open Source SQL DBMS with ACID • Eliminates traditional OLTP overhead • In-memory performance levels - millions of TPS • Linear scalability on scale-out clusters of commodity servers • SQL • ACID transactions: immediate data consistency and integrity • High availability 24x7x365 March 3, 2009 | 22
Slide 23: Q&A • Visit http://voltdb.com to… • Download VoltDB • Get sample app code • Access webinar recording, slides & Q&A transcript + we’ll notify you when they’re posted • Join the VoltDB community • VoltDB user groups: www.meetup.com/voltdb • Follow VoltDB on Twitter @voltdb March 3, 2009 | 23

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location