ljin's picture
From ljin rss RSS  subscribe Subscribe

Real-World-Web-Performance-Scalability 



 

 
Views:  1360
Downloads:  1
Published:  August 24, 2009
 
1
download

Share plick with friends Share
save to favorite
 
Related Plicks
No related plicks found
 
More from this user
2010年上海世博会场馆简介

2010年上海世博会场馆简介

From: ljin
Views: 2220
Comments: 2

mortgage fraud causes and consequences

mortgage fraud causes and consequences

From: ljin
Views: 1762
Comments: 0

2010上海世博會

2010上海世博會

From: ljin
Views: 260
Comments: 1

netflix culture

netflix culture

From: ljin
Views: 895
Comments: 0

Anything is possible

Anything is possible

From: ljin
Views: 5027
Comments: 1

See if you can do it!

See if you can do it!

From: ljin
Views: 2617
Comments: 2

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: If this text is too small to read, move closer! http://groups.google.com/group/scalable Real World Web: Performance & Scalability Ask Bjørn Hansen Develooper LLC http://develooper.com/talks/ April 14, 2008 – r17
Slide 2: Hello. • I’m Ask Bjørn Hansen perl.org, ~10 years of mod_perl app development, mysql and scalability consulting YellowBot • I hate tutorials! • Let’s do 3 hours of 5 minute° lightning talks! ° Actual number of minutes may vary
Slide 3: Construction Ahead! • • • • Conflicting advice ahead Not everything here is applicable to everything Ways to “think scalable” rather than be-all-end-all solutions Don’t prematurely optimize! (just don’t be too stupid with the “we’ll fix it later” stuff)
Slide 4: Questions ... • • • • • • • • How many ... ... are using PHP? Python? Python? Java? Ruby? C? 3.23? 4.0? 4.1? 5.0? 5.1? 6.x? MyISAM? InnoDB? Other? Are primarily “programmers” vs “DBAs” Replication? Cluster? Partitioning? Enterprise? Community? PostgreSQL? Oracle? SQL Server? Other?
Slide 5: Seen this talk before? Slide count 200 • • • No, you haven’t. :-) 150 100 ~266 people * 3 hours = half a work year! 50 0 2001 2004 2006 2007 2008
Slide 6: Question Policy! http://groups.google.com/group/scalable • • • • • Do we have time for questions? Yes! (probably) Quick questions anytime Long questions after Slides per minute 1.75 1.00 • or on the list! 0.25 2001 2002 2004 2005 2006 (answer to anything is likely “it depends” or “let’s talk about it after / send me an email”) 2007 2008
Slide 7: • • The first, last and only lesson: Think Horizontal! • Everything in your architecture, not just the front end web servers • Micro optimizations and other implementation details –– Bzzzzt! Boring! (blah blah blah, we’ll get to the cool stuff in a moment!)
Slide 8: Benchmarking techniques • Scalability isn't the same as processing time • • • • • • Not “how fast” but “how many” Test “force”, not speed. Think amps, not voltage Test scalability, not just “performance” Test with "slow clients" Use a realistic load Testing “how fast” is ok when optimizing implementation details (code snippets, sql queries, server settings)
Slide 9: Vertical scaling • • • • • “Get a bigger server” “Use faster CPUs” Can only help so much (with bad scale/$ value) A server twice as fast is more than twice as expensive Super computers are horizontally scaled!
Slide 10: Horizontal scaling • • • “Just add another box” (or another thousand or ...) Good to great ... • • Implementation, scale your system a few times Architecture, scale dozens or hundreds of times Get the big picture right first, do micro optimizations later
Slide 11: Scalable Application Servers Don’t paint yourself into a corner from the start
Slide 12: Run Many of Them • • • • Avoid having The Server for anything Everything should (be able to) run on any number of boxes Don’t replace a server, add a server Support boxes with different capacities
Slide 13: Stateless vs Stateful • • • “Shared Nothing” Don’t keep state within the application server (or at least be Really Careful) Do you use PHP, mod_perl, mod_... • • Anything that’s more than one process You get that for free! (usually)
Slide 14: Sessions “The key to be stateless” or “What goes where”
Slide 15: No Local Storage • • • Ever! Not even as a quick hack. Storing session (or other state information) “on the server” doesn’t work. “But my load balancer can do ‘sticky sessions’” • • Uneven scaling – waste of resources (and unreliable, too!) The web isn’t “session based”, it’s one short request after another – deal with it
Slide 16: Evil Session Cookie: session_id =12345 Web/application server with local Session store What’s wrong with this? ... 12345 => { user => { username => 'joe', email => 'joe@example.com', id => 987, }, shopping_cart => { ... }, last_viewed_items => { ... }, background_color => 'blue', }, 12346 => { ... }, ....
Slide 17: Evil Session Cookie: session_id =12345 Easy to guess cookie id Saving state on one server! Web/application server with local Session store ... 12345 => { user => { username => 'joe', email => 'joe@example.com', id => 987, }, shopping_cart => { ... }, last_viewed_items => { ... }, background_color => 'blue', }, 12346 => { ... }, .... Duplicate data from a DB table Big blob of junk! What’s wrong with this?
Slide 18: Cookie: sid=seh568fzkj5k09z; user=987-65abc; bg_color=blue; cart=...; Good Session! Web/application server • Stateless Database(s) Users 987 => { username => 'joe', email => 'joe@example.com', }, ... Shopping Carts ... web server! database memcached cache seh568fzkj5k09z => { last_viewed_items => {...}, ... other "junk" ... }, .... • Important data in • Individual expiration on session objects in cookies • Small data items
Slide 19: Safe cookies • • Worried about manipulated cookies? Use checksums and timestamps to validate • • cookie=1/value/1123157440/ABCD1234 cookie=$cookie_format_version /$value/$timestamp /$checksum • function cookie_checksum { md5_hex( $secret + $time + value ); }
Slide 20: Safe cookies • Want fewer cookies? Combine them: • • cookie=1/user::987/cart::943/ts::1123.../EFGH9876 cookie=$cookie_format_version /$key::$value[/$key::$value] /ts::$timestamp /$md5 • Encrypt cookies if you must (rarely worth the trouble and CPU cycles)
Slide 21: I did everything – it’s still slow! • • • • Optimizations and good micro-practices are necessary, of course But don’t confuse what is what! Know when you are optimizing Know when you need to step back and rethink “the big picture”
Slide 22: Caching How to not do all that work again and again and again...
Slide 23: Cache hit-ratios • • • • • Start with things you hit all the time Look at web server and database logs Don’t cache if you’ll need more effort writing to the cache than you save Do cache if it’ll help you when that one single page gets a million hits in a few hours (one out of two hundred thousand pages on the digg frontpage) Measure! Don’t assume – check!
Slide 24: Generate Static Pages • • • • • Ultimate Performance: Make all pages static Generate them from templates nightly or when updated Doesn’t work well if you have millions of pages or page variations Temporarily make a page static if the servers are crumbling from one particular page being busy Generate your front page as a static file every N minutes
Slide 25: Cache full pages (or responses if it’s an API) • • • • Cache full output in the application Include cookies etc. in the “cache key” Fine tuned application level control The most flexible • • “use cache when this, not when that” (anonymous users get cached page, registered users get a generated page) Use regular expressions to insert customized content into the cached page
Slide 26: Cache full pages 2 • • • • Front end cache (Squid, Varnish, mod_cache) stores generated content • • Set Expires/Cache-Control header to control cache times or Rewrite rule to generate page if the cached file doesn’t exist (this is what Rails does or did...) – only scales to one server RewriteCond %{REQUEST_FILENAME} !-s RewriteCond %{REQUEST_FILENAME}/index.html !-s RewriteRule (^/.*) /dynamic_handler/$1 [PT] Still doesn’t work for dynamic content per user (”6 items in your cart”) Works for caching “dynamic” images ... on one server
Slide 27: Cache partial pages • • • • Pre-generate static page “snippets” (this is what my.yahoo.com does or used to do...) • Have the handler just assemble pieces ready to go Cache little page snippets (say the sidebar) Be careful, easy to spend more time managing the cache snippets than you save! “Regexp” dynamic content into an otherwise cached page
Slide 28: Cache data • • • • • Cache data that’s slow to query, fetch or calculate Generate page from the cached data Use the same data to generate API responses! Moves load to cache servers • (For better or worse) Good for slow data used across many pages (”todays bestsellers in $category”)
Slide 29: Caching Tools Where to put the cache data ...
Slide 30: A couple of bad ideas Don’t do this! • • • Process memory ($cache{foo}) • • • • • Not shared! Limited to one machine (likewise for a file system cache) Some implementations are really fast Flushed on each update Nice if it helps; don’t depend on it Shared memory? Local file system? MySQL query cache
Slide 31: • • • • • Write into one or more cache tables id is the “cache key” type is the “namespace” MySQL cache table metadata for things like headers for cached http responses purge_key to make it easier to delete data from the cache CREATE TABLE `combust_cache` ( `id` varchar(64) NOT NULL, `type` varchar(20) NOT NULL default '', `created` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, `purge_key` varchar(16) default NULL, `data` mediumblob NOT NULL, `metadata` mediumblob, `serialized` tinyint(1) NOT NULL default '0', `expire` datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`id`,`type`), KEY `expire_idx` (`expire`), KEY `purge_idx` (`purge_key`) ) ENGINE=InnoDB
Slide 32: MySQL Cache Fails • • • Scaling and availability issues • • How do you load balance? How do you deal with a cache box going away? Partition the cache to spread the write load Use Spread to write to the cache and distribute configuration • General theme: Don’t write directly to the DB
Slide 33: MySQL Cache Scales • • • • • • Persistence Most of the usual “scale the database” tricks apply Partitioning Master-Master replication for availability .... more on those things in a moment Put metadata in memcached for partitioning and failover information
Slide 34: memcached • • • • • LiveJournal’s distributed caching system (used practically everywhere!) Memory based – memory is cheap! Linux 2.6 (epoll) or FreeBSD (kqueue) • Low overhead for many many connections Run it on boxes with free memory ... or a dedicated cluster: Facebook has more than five hundred dedicated memcached servers (a lot of memory!)
Slide 35: more memcached • • • • • • No “master” – fully distributed Simple lightweight protocol (binary protocol coming) Scaling and high-availability is “built-in” Servers are dumb – clients calculate which server to use based on the cache key Clients in perl, java, php, python, ruby, ... New C client library, libmemcached http://tangent.org/552/libmemcached.html
Slide 36: How to use memcached • • • It’s a cache, not a database Store data safely somewhere else Pass-through cache (id = session_id or whatever): Read $data = memcached_fetch( $id ); return $data if $data $data = db_fetch( $id ); memcached_store( $id, $data ); return $data; Write db_store( $id, $data ); memcached_store( $id, $data );
Slide 37: Client Side Replication • • • • memcached is a cache - the data might “get lost” What if a cache miss is Really Expensive? Store all writes to several memcached servers Client libraries are starting to support this natively
Slide 38: Store complex data • • • • Most (all?) client libraries support complex data structures A bit flag in memcached marks the data as “serialized” (another bit for “gzip”) All this happens on the client side – memcached just stores a bunch of bytes Future: Store data in JSON? Interoperability between languages!
Slide 39: Store complex data 2 • • Primary key lookups are probably not worth caching Store things that are expensive to figure out! function get_slow_summary_data($id) { $data = memcached_fetch( $id ); return $data if $data $data = do_complicated_query( $id ); memcached_store( $id, $data ); return $data; }
Slide 40: Cache invalidation • • • • • • • Writing to the cache on updates is hard! Caching is a trade-off You trade “fresh” for “fast” Decide how “fresh” is required and deal with it! Explicit deletes if you can figure out what to delete Add a “generation” / timestamp / whatever to the cache key select id, unix_timestamp(modified_on) as ts from users where username = ‘ask’; memcached_fetch( “user_friend_updates; $id; $ts” )
Slide 41: Caching is a trade-off • • Can’t live with it? Make the primary data-source faster or data-store scale!
Slide 42: Database scaling How to avoid buying that gazillion dollar Sun box ~$4,000,000 Vertical ~$3,200 ( = 1230 for $4.0M!)
Slide 43: Be Simple • Use MySQL! • • • It’s fast and it’s easy to manage and tune Easy to setup development environments Other DBs can be faster at certain complex queries but are harder to tune – and MySQL is catching up! • • Avoid making your schema too complicated Ignore some of the upcoming advice until you REALLY need it! • • (even the part about not scaling your DB “up”) PostgreSQL is fast too :-)
Slide 44: Replication More data more places! Share the love load
Slide 45: Basic Replication • • • Good Great for read intensive applications Write to one master Read from many slaves reads webservers writes master writes slave slave slave reads Lots more details in “High Performance MySQL” old, but until MySQL 6 the replication concepts are the same loadbalancer
Slide 46: Relay slave replication • • • • Running out of bandwidth on the master? Replicating to multiple data centers? A “replication slave” can be master to other slaves Almost any possible replication scenario can be setup (circular, star replication, ...) reads webservers writes data loading script writes master writes relay slave A relay slave B slave slave slave slave slave slave reads loadbalancer
Slide 47: Replication Scaling – Reads • • Reading scales well with replication Great for (mostly) read-only applications One server Two servers capacity reads reads writes writes reads writes (thanks to Brad Fitzpatrick!)
Slide 48: Replication Scaling – Writes (aka when replication sucks) • • reads Writing doesn’t scale with replication All servers needs to do the same writes reads reads reads reads reads capacity writes writes writes writes writes writes
Slide 49: Partition the data Divide and Conquer! or Web 2.0 Buzzword Compliant! Now free with purchase of milk!!
Slide 50: Partition your data • 96% read application? Skip this step... Cat cluster master slave master slave Dog cluster • Solution to the too many writes problem: Don’t have all data on all servers different data sets slave slave slave slave • Use a separate cluster for
Slide 51: The Write Web! • • • • • dogs Replication too slow? Don’t have replication slaves! Use a (fake) master-master setup and partition / shard the data! Simple redundancy! No latency from commit to data being available Don’t bother with fancy 2 or 3 phase commits master master cats master master fish • (Make each “main object” (user, product, ...) always use the same master – as long as it’s available) master master
Slide 52: Partition with a global master server • • • • • • Can’t divide data up in “dogs” and “cats”? Flexible partitioning! The “global” server keeps track of which cluster has the data for user “623” Get all PKs from the global master Only auto_increment columns in the “global master” Aggressively cache the “global master” data (memcached) webservers Where is user 623? user 623 is in cluster 3 global master slave (backup) master master • and/or use MySQL Cluster (ndb) data clusters select * from some_data where user_id = 623 cluster 3 cluster 2 cluster 1
Slide 53: Master – Master setup • • • • Setup two replicas of your database copying changes to each-other Keep it simple! (all writes to one master) Instant fail-over host – no slave changes needed Configuration is easy! • set-variable set-variable = auto_increment_increment=2 = auto_increment_offset=1 • • (offset = 2 on second master) Setup both systems as a slave of the other
Slide 54: Online Schema Changes The reasons we love master-master! • Do big schema changes with no downtime! • • • • • • Stop A to B replication Move traffic to B Do changes on A Wait for A to catchup on replication Move traffic to A Re-start A to B replication
Slide 55: Hacks! Don’t be afraid of the data-duplication monster http://flickr.com/photos/firevixen/75861588/
Slide 56: Summary tables! • Find queries that do things with COUNT(*) and GROUP BY and create tables with the results! • • • • Data loading process updates both tables or hourly/daily/... updates Variation: Duplicate data in a different “partition” Data affecting both a “user” and a “group” goes in both the “user” and the “group” partition (Flickr does this)
Slide 57: Summary databases! • • • Don’t just create summary tables Use summary databases! Copy the data into special databases optimized for special queries • • • • full text searches index with both cats and dogs anything spanning all clusters Different databases for different latency requirements (RSS feeds from replicated slave DB)
Slide 58: Make everything repeatable • • • • Script failed in the middle of the nightly processing job? (they will sooner or later, no matter what) How do you restart it? Build your “summary” and “load” scripts so they always can be run again! (and again and again) One “authoritative” copy of a data piece – summaries and copies are (re)created from there
Slide 59: Asynchronous data loading • • • • • Updating counts? Loading logs? Don’t talk directly to the database, send updates through Spread (or whatever) to a daemon loading data Don’t update for each request update counts set count=count+1 where id=37 Aggregate 1000 records or 2 minutes data and do fewer database changes update counts set count=count+42 where id=37 Being disconnected from the DB will let the frontend keep running if the DB is down!
Slide 60: “Manual” replication • • • • • • • Save data to multiple “partitions” Application writes two places or last_updated/modified_on and deleted columns or Use triggers to add to “replication_queue” table Background program to copy data based on the queue table or the last_updated column Build summary tables or databases in this process Build star/spoke replication system
Slide 61: Preload, -dump and -process • Let the servers do as much as possible without touching the database directly • • • Data structures in memory – ultimate cache! Dump never changing data structures to JS files for the client to cache Dump smaller read-only often accessed data sets to SQLite or BerkeleyDB and rsync to each webserver (or use NFS, but...) • Or a MySQL replica on each webserver
Slide 62: Stored Procedures Dangerous • • • • Not horizontal Bad: Work done in the database server (unless it’s read-only and replicated) Good: Work done on one of the scalable web fronts Only do stored procedures if they save the database work (network-io work > SP work)
Slide 63: a brief diversion ... Running Oracle now? webservers • • • • • writes Move read operations to MySQL! Replicate from Oracle to a MySQL cluster with “manual replication” Use triggers to keep track of changed rows reads in Oracle Copy them to the MySQL master server with a replication program Good way to “sneak” MySQL in ... Oracle replication program writes master writes slave slave slave reads loadbalancer
Slide 64: Optimize the database Faster, faster, faster ....
Slide 65: ... very briefly • • The whole conference here is about this ... so I’ll just touch on a few ideas
Slide 66: Memory for MySQL = good • • • • • Put as much memory you can afford in the server (Currently 2GB sticks are the best value) InnoDB: Let MySQL use ~all memory (don’t use more than is available, of course!) MyISAM: Leave more memory for OS page caches Can you afford to lose data on a crash? Optimize accordingly Disk setup: We’ll talk about RAID later
Slide 67: What’s your app doing? • • • Enable query logging in your development DB! Are all those queries really necessary? Cache candidates? (you do have a devel db, right?) • • • Just add “log=/var/lib/mysq/sql.log” to .cnf Slow query logging: log-slow-queries log-queries-not-using-indexes long_query_time=1 mysqldumpslow parses the slow log • 5.1+ does not require a server restart and, can log directly into a CSV table...
Slide 68: Table Choice • • Short version: Use InnoDB, it’s harder to make them fall over Long version: Use InnoDB except for • • • • • Big read-only tables (smaller, less IO) High volume streaming tables (think logging) • • Locked tables / INSERT DELAYED ARCHIVE table engine Specialized engines for special needs More engines in the future For now: InnoDB
Slide 69: Multiple MySQL instances • • • • • Run different MySQL instances for different workloads • • Even when they share the same server anyway! InnoDB vs MyISAM instance prod cluster (innodb, normalized columns) Move to separate hardware and replication easier Optimize MySQL for the particular workload Very easy to setup with the instance manager or mysqld_multi mysql.com init.d script supports the instance manager (don’t use the redhat/fedora script!) search_load process search cluster (myisam, fulltext columns)
Slide 70: Config tuning helps Query tuning works • • • • • Configuration tuning helps a little The big performance improvements comes from schema and query optimizations – focus on that! Design schema based on queries Think about what kind of operations will be common on the data; don’t go for “perfect schema beauty” What results do you need? (now and in the future)
Slide 71: EXPLAIN • • • Use the “EXPLAIN SELECT ...” command to check the query Baron Schwartz talks about this 2pm on Tuesday! Be sure to read http://dev.mysql.com/doc/mysql/en/mysql-indexes.html http://dev.mysql.com/doc/mysql/en/explain.html
Slide 72: Use smaller data • Use Integers • • Always use integers for join keys And when possible for sorts, group bys, comparisons • • Don’t use bigint when int will do Don’t use varchar(255) when varchar(20) will do
Slide 73: Store Large Binary Objects (aka how to store images) • • • • Meta-data table (name, size, ...) Store images either in the file system • • • • meta data says “server ‘123’, filename ‘abc’” (If you want this; use mogilefs or Amazon S3 for storage!) OR store images in other tables Split data up so each table don’t get bigger than ~4GB Include “last modified date” in meta data Include it in your URLs if possible to optimize caching images/$timestamp/$id.jpg) (/
Slide 74: Reconsider Persistent DB Connections • • • • • • DB connection = thread = memory With partitioning all httpd processes talk to all DBs With lots of caching you might not need the main database that often MySQL connections are fast Always use persistent connections with Oracle! • • Commercial connection pooling products pgsql, sybase, oracle? Need thousands of persistent connections? In Perl the new DBD::Gofer can help with pooling!
Slide 75: InnoDB configuration • innodb_file_per_table • Makes optimize Splits your innodb data into a file per table instead of one big annoying file table `table` clear unused space • innodb_buffer_pool_size=($MEM*0.80) • innodb_flush_log_at_trx_commit setting • innodb_log_file_size • transaction-isolation = READ-COMMITTED
Slide 76: My favorite MySQL feature • • • insert into t (somedate) values (“blah”); insert into t (someenum) values (“bad value”); Make MySQL picky about bad input! • SET sql_mode = 'STRICT_TRANS_TABLES’; • Make your application do this on connect
Slide 77: Don’t overwork the DB • • • • Databases don’t easily scale Don’t make the database do a ton of work Referential integrity is good • Tons of stored procedures to validate and process data not so much Don’t be too afraid of de-normalized data – sometimes it’s worth the tradeoffs (call them summary tables and the DBAs won’t notice)
Slide 78: Use your resources wisely don’t implode when things run warm
Slide 79: Work in parallel • • Split the work into smaller (but reasonable) pieces and run them on different boxes Send the sub-requests off as soon as possible, do something else and then retrieve the results
Slide 80: Job queues • • • • Processing time too long for the user to wait? Can only process N requests / jobs in parallel? Use queues (and external worker processes) IFRAMEs and AJAX can make this really spiffy (tell the user “the wait time is 20 seconds”)
Slide 81: Job queue tools • Database “queue” webservers • • • • Dedicated queue table or just processed_on and grabbed_on columns Webserver submits job First available “worker” picks it up and returns the result to the queue Webserver polls for status Queue DB workers workers workers workers
Slide 82: More Job Queue tools • beanstalkd - great protocol, fast, no persistence (yet) • gearman - for one off out-of-band jobs • starling - from twitter, memcached protocol, disk based persistence http://xph.us/software/beanstalkd/ http://www.danga.com/gearman/ http://rubyforge.org/projects/starling/ • TheSchwartz from SixApart, used in Movable Type • Spread • MQ / Java Messaging Service(?) / ...
Slide 83: Log http requests • • • • • • Log slow http transactions to a database time, response_time, uri, remote_ip, user_agent, request_args, user, svn_branch_revision, log_reason (a “SET” column), ... Log to ARCHIVE tables, rotate hourly / weekly / ... Log 2% of all requests! Log all 4xx and 5xx requests Great for statistical analysis! • • Which requests are slower Is the site getting faster or slower? Time::HiRes in Perl, microseconds from gettimeofday system call
Slide 84: Intermission ?
Slide 85: ! • • • • Use light processes for light tasks Thin proxies servers or threads for “network buffers” Goes between the user and your heavier backend application Built-in load-balancing! (for Varnish, perlbal, ...) httpd with mod_proxy / mod_backhand • • perlbal – more on that in a bit Varnish, squid, pound, ...
Slide 86: Proxy illustration perlbal or mod_proxy low memory/resource usage Users backends lots of memory db connections etc
Slide 87: Light processes • • • • • • Save memory and database connections This works spectacularly well. Really! Can also serve static files Avoid starting your main application as root Load balancing In particular important if your backend processes are “heavy”
Slide 88: Light processes • • Apache 2 makes it Really Easy ProxyPreserveHost On <VirtualHost *> ServerName combust.c2.askask.com ServerAlias *.c2.askask.com RewriteEngine on RewriteRule (.*) http://localhost:8230$1 [P] </VirtualHost> • • Easy to have different “backend environments” on one IP Backend setup (Apache 1.x) Listen 127.0.0.1:8230 Port 80
Slide 89: perlbal configuration CREATE POOL POOL POOL POOL POOL my_apaches my_apaches ADD 10.0.0.10:8080 my_apaches ADD 10.0.0.11:8080 my_apaches ADD 10.0.0.12 my_apaches ADD 10.0.0.13:8081 0.0.0.0:80 reverse_proxy my_apaches on on on CREATE SERVICE balancer SET listen = SET role = SET pool = SET persist_client = SET persist_backend = SET verify_backend = ENABLE balancer
Slide 90: A few thoughts on development ...
Slide 91: All Unicode All The Time • • The web is international and multilingual, deal with it. All Unicode all the time! (except when you don’t need it – urls, email addresses, ...) • Perl: DBD::mysql was fixed last year! PHP 6 will have improved Unicode support. Ruby 2 will someday, too... • It will never be easier to convert than now!
Slide 92: Use UTC Coordinated Universal Time • • • It might not seem important now, but some day ... It will never be easier to convert than now! Store all dates and times as UTC, convert to “local time” on display
Slide 93: Build on APIs • • • • • • All APIs All The Time! Use “clean APIs” Internally in your application architecture Loosely coupled APIs are easier to scale • Add versioning to APIs (“&api_version=123”) Easier to scale development Easier to scale deployment Easier to open up to partners and users!
Slide 94: Why APIs? • Natural place for “business logic” • • • • • • • Controller = “Speak HTTP” Model = “Speak SQL” View = “Format HTML / ...” API = “Do Stuff” Aggregate just the right amount of data Awesome place for optimizations that matter! The data layer knows too little
Slide 95: More development philosophy • • • • Do the Simplest Thing That Can Possibly Work ... but do it really well! Balance the complexity, err on the side of simple This is hard!
Slide 96: Pay your technical debt • Don’t incur technical debt • • • “We can’t change that - last we tried the site went down” “Just add a comment with ‘TODO’” “Oops. Where are the backups? What do you mean ‘no’?” “Who has the email with that bug?” • • • Interest on technical debt will kill you Pay it back as soon as you can!
Slide 97: Coding guidelines • • • Keep your formatting consistent! • perl: perltidy, perl best practices, Perl::Critic Keep your APIs and module conventions consistent Refactor APIs mercilessly (in particular while they are not public)
Slide 98: qmail lessons • • • • Lessons from 10 years of qmail Research paper from Dan Bernstein http://cr.yp.to/qmail/qmailsec-20071101.pdf Eliminate bugs • • Test coverage Keep data flow explicit (continued)
Slide 99: qmail lessons (2) • Eliminate code – less code = less bugs! • • • • Refactor common code Reuse code (Unix tools / libs, CPAN, PEAR, Ruby Gems, ...) Reuse access control • Eliminate trusted code – what needs access? Treat transformation code as completely untrusted
Slide 100: Joint Strike Fighter • • • • • ~Superset of the “Motor Industry Software Reliability Association Guidelines For The Use Of The C Language In Vehicle Based Software” Really Very Detailed! No recursion! (Ok, ignore this one :-) ) Do make guide lines – know when to break them Have code reviews - make sure every commit email gets read (and have automatic commit emails in the first place!)
Slide 101: High Availability and Load Balancing and Disaster Recovery
Slide 102: High Availability • • • Automatically handle failures! unplugged the wrong box”, ...) (bad disks, failing fans, “oops, For your app servers the load balancing system should take out “bad servers” (most do) • perlbal or Varnish can do this for http servers Easy-ish for things that can just “run on lots of boxes”
Slide 103: Make that service always work! • Sometimes you need a service to always run, but on specific IP addresses • • • • • Load balancers (level 3 or level 7: perlbal/varnish/squid) Routers DNS servers NFS servers Anything that has failover or an alternate server – the IP needs to move (much faster than changing DNS)
Slide 104: Load balancing • • • • • Key to horizontal scaling (duh) 1) All requests goes to the load balancer 2) Load balancer picks a “real server” Hardware (lots of vendors) Coyote Point have relatively cheaper ones • Look for older models for cheap on eBay! Linux Virtual Server Open/FreeBSD firewall rules (pf firewall pools) (no automatic failover, have to do that on the “real servers”)
Slide 105: Load balancing 2 • • • Use a “level 3” (tcp connections only) tool to send traffic to your proxies Through the proxies do “level 7” (http) load balancing perlbal has some really good features for this!
Slide 106: perlbal • • • • • • Event based for HTTP load balancing, web serving, and a mix of the two (see below). Practical fancy features like “multiplexing” keep-alive connections to both users and back-ends Everything can be configured or reconfigured on the fly If you configure your backends to only allow as many connections as they can handle (you should anyway!) perlbal with automatically balance the load “perfectly” Can actually give Perlbal a list of URLs to try. Perlbal will find one that's alive. Instant failover! http://www.danga.com/perlbal/
Slide 107: Varnish • • • • • • • Modern high performance http accelerator Optimized as a “reverse cache” Whenever you would have used squid, give this a look Recently got “Vary” support Super efficient (except it really wants to “take over” a box) Written by Poul-Henning Kamp, famed FreeBSD contributor BSD licensed, work is being paid by a norwegian newspaper • http://www.varnish-cache.org/
Slide 108: Fail-over tools “move that IP”
Slide 111: Buy a “hardware load balancer” • • • • Generally Quite Expensive • (Except on eBay - used network equipment is often great) Not appropriate (cost-wise) until you have MANY servers If the feature list fits it “Just Works” ... but when we are starting out, what do we use?
Slide 112: wackamole • • • • • • • • Simple, just moves the IP(s) Can embed Perl so you can run Perl functions when IPs come and go Easy configuration format Setup “groups of IPs” Supports Linux, FreeBSD and Solaris Spread toolkit for communication Easy to troubleshoot (after you get Spread working...) http://www.backhand.org/wackamole/
Slide 113: Heartbeat • • • • • • Monitors and moves services (an IP address is “just a service”) v1 has simple but goofy configuration format v2 supports all sorts of groupings, larger clusters (up to 16 servers) Uses /etc/init.d type scripts for running services Maybe more complicated than you want your HA tools http://www.linux-ha.org/
Slide 114: Carp + pfsync • Patent-free version of Ciscos “VRRP” (Virtual Router Redundancy Protocol) • FreeBSD and OpenBSD only • Carp (moves IPs) and pfsync (synchronizes firewall state) • (awesome for routers and NAT boxes) • Doesn’t do any service checks, just moves IPs around
Slide 115: mysql master master replication manager • • • • • • • mysql-master-master tool can do automatic failover! No shared disk Define potential “readers” and “writers” List of “application access” IPs Reconfigures replication Moves IPs http://code.google.com/p/mysql-master-master/ http://groups.google.com/group/mmm-devel/
Slide 116: Suggested Configuration • • Open/FreeBSD routers with Carp+pfsync for firewalls A set of boxes with perlbal + wackamole on static “always up” HTTP enabled IPs • Trick on Linux: Allow the perlbal processes to bind to all IPs (no port number tricks or service reconfiguration or restarts!) echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind or or sysctl -w net.ipv4.ip_nonlocal_bind=1 echo net.ipv4.ip_nonlocal_bind = 1 >> /etc/sysctl.conf • • • Dumb regular http servers “behind” the perlbal ones wackamole for other services like DNS mmm for mysql fail-over
Slide 117: Redundancy fallacy! • • Don’t confuse load-balancing with redundancy What happens when one of these two fail? Load balanced servers load / capacity Load (55%) Load (60%)
Slide 118: Oops – no redundancy! • • • Always have “n+1” capacity Consider have a “passive spare” (active/passive with two servers) Careful load monitoring! More than 100% load on 1 server! Load (50%) • • • Munin http://munin.projects.linpro.no/ MySQL Network (ganglia, cacti, ...) Load Load (60%)
Slide 119: High availability Shared storage • • • • • NFS servers (for diskless servers, ...) Failover for database servers Traditionally either via fiber or SCSI connected to both servers Or NetApp filer boxes All expensive and smells like “the one big server”
Slide 120: Cheap high availability storage with DRBD • • • • • Synchronizes a block device between two servers! “Network RAID1” Typically used in Active/Primary-Standby/Secondary setup If the active server goes down the secondary server will switch to primary, run fsck, mount the device and start the service (MySQL / NFS server / ...) v0.8 can do writes on both servers at once – “shared disk semantics” (you need a filesystem on top that supports that, OCFS, GFS, ... – probably not worth it, but neat)
Slide 121: Disaster Recovery • Separate from “fail-over” (no disaster if we failed-over...) • • • • “The rescue truck fell in the water” “All the ‘redundant’ network cables melted” “The datacenter got flooded” “The grumpy sysadmin sabotaged everything before he left”
Slide 122: Disaster Recovery Planning • • • • • You won’t be back up in 2 hours, but plan so you quickly will have an idea how long it will be Have a status update site / weblog Plans for getting hardware replacements Plans for getting running temporarily on rented “dedicated servers” (ev1servers, rackspace, ...) And ....
Slide 123: Backup your databse! • • • • Binary logs! • Keep track of “changes since the last snapshot” Use replication to Another Site (doesn’t help on “for $table = @tables { truncate $table }”) On small databases use mysqldump Zmanda MySQL Backup (or whatever similar tool your database comes with) packages the different tools and options
Slide 124: Backup Big Databases • Use mylvmbackup to snapshot and archive • • • • • Requires data on an LVM device (just do it) InnoDB: Automatic recovery! (ooh, magic) MyISAM: Read Lock your database for a few seconds before making the snapshot (on MySQL do a “FLUSH TABLES” first (which might be slow) and then a “FLUSH TABLES WITH READ LOCK” right after) Sync the LVM snapshot elsewhere And then remove the snapshot! • Bonus Optimization: Run the backup from a replication slave!
Slide 125: Backup on replication slave • • Or just run the backup from a replication slave ... Keep an extra replica of your master • • shutdown mysqld and archive the data Small-ish databases: mysqldump --single-transaction
Slide 126: All Automation All The Time or How to manage 200 servers in your spare-time System Management
Slide 127: Keep software deployments easy • • • Make upgrading the software a simple process Script database schema changes Keep configuration minimal • • • • Servername (“www.example.com”) Database names (“userdb = host=db1;db=users”;...” If there’s a reasonable default, put the default in the code (for example ) “deployment_mode = devel / test / prod” lets you put reasonable defaults in code
Slide 128: Easy software deployment 2 • • • • • • How do you distribute your code to all the app servers? Use your source code repository (Subversion etc)! (tell your script to svn up to http://svn/branches/prod revision 123 and restart) .tar.gz to be unpacked on each server .rpm or .deb package NFS mount and symlinks No matter what: Make your test environment use the same mechanism as production and: Have it scripted!
Slide 129: have everything scripted! actually, http://flickr.com/photos/karlequin/84829873/
Slide 130: Configuration management Rule Number One • • • • • Configuration in SVN (or similar) “infrastructure/” repository SVN rather than rcs to automatically have a backup in the Subversion server – which you are carefully backing up anyway Keep notes! Accessible when the wiki is down; easy to grep Don’t worry about perfect layout; just keep it updated
Slide 131: Configuration management Rule Two • • • • Repeatable configuration! Can you reinstall any server Right Now? Use tools to keep system configuration in sync Upcoming configuration management (and more) tools! • • csync2 (librsync and sqlite based sync tool) puppet (central server, rule system, ruby!)
Slide 132: puppet • Automating sysadmin tasks! • 1) Client provides “facter” to server 2) Server makes configuration 3) Client implements configuration service { "sshd": enable => true, ensure => running } package { "vim-enhanced": ensure => installed } package { "emacs": ensure => installed } • •
Slide 133: node db-server inherits standard { include mysql_server include solfo_hw } puppet example node db2, db3, db4 inherits db-server { } node trillian inherits db-server { include ypbot_devel_dependencies } # ----------------------------class mysql_client { package { "MySQL-client-standard": ensure => installed } package { "MySQL-shared-compat": ensure => installed } } class mysql_server { file { "/mysql": ensure => directory, } package { "MySQL-server-standard": ensure => installed } } include mysql_client
Slide 134: puppet mount example class nfs_client_pkg { • Ensure an NFS mount exists, except on the NFS servers file { "/pkg": ensure => directory, } $mount = $hostname ? { "nfs-a" => absent, "nfs-b" => absent, default => mounted } mount { "/pkg": atboot => true, device => 'nfs.la.sol:/pkg', ensure => $mount, fstype => 'nfs4', options => 'ro,intr,noatime', require => File["/pkg"], } }
Slide 135: More puppet features • In addition to services, packages and mounts... • • • • • Manage users Manage crontabs Copy configuration files (with templates) … and much more Recipes, reference documentation and more at http://reductivelabs.com/
Slide 136: Backups! • • Backup everything you can • • • • • • Check/test the backups routinely Super easy deployment: rsnapshot Uses rsync and hardlinks to efficiently store many backup generations Server initiated – just needs ssh and rsync on client Simple restore – files • Other tools Amanda (Zmanda) Bacula
Slide 137: Backup is cheap! • • • Extra disk in a box somewhere? That can do! Disks are cheap – get more! Disk backup server in your office: Enclosure + PSU: $275 CPU + Board + RAM: $400 3ware raid (optional): $575 6x1TB disks: $1700 (~4TB in raid 6) = $3000 for 4TB backup space, easily expandable (or less than $5000 for 9TB space with raid 6 and hot standby) • Ability to get back your data = Priceless!
Slide 138: somewhat tangentially ... RAID Levels RAID-I (1989) consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dualstring SCSI controllers, 28 5.25-inch SCSI disks and specialized disk striping software. http://www.cs.berkeley.edu/~pattrsn/Arch/prototypes2.html
Slide 139: Basic RAID levels • • • • • RAID 0 Stripe all disks (capacity = N*S Fail: Any disk RAID 1 Mirror all disks (capacity = S) Fail: All disks RAID 10 Combine RAID 1 and 0 (capacity = N*S / 2) RAID 5 RAID 0 with parity (capacity = N*S - S) Fail: 2 disks RAID 6 Two parity disks (capacity = N*S - S*2) Fail: 3 disks!
Slide 140: RAID 1 • • • Mirror all disks to all disks Simple - easiest to recover! Use for system disks and small backup devices
Slide 141: RAID 0 • • • • • • Use for redundant database mirrors or scratch data that you can quickly rebuild Absolutely never for anything you care about Failure = system failure Great performance; no safety Capacity = 100% Disk IO = every IO available is “useful”
Slide 142: RAID 10 • • • • • Stripe of mirrored devices IO performance and capacity of half your disks - not bad! Relatively good redundancy, lose one disk from each of the “sub-mirrors” Quick rebuild: Just rebuild one mirror More disks = more failures! If you have more than X disks, keep a hot spare.
Slide 143: RAID 5 • • • • • Terrible database performance A partial block write = read all disks! When degraded a RAID 5 is a RAID 0 in redundancy! Rebuilding a RAID 5 is a great way to find more latent errors Don’t use RAID 5 – just not worth it
Slide 144: RAID 6 • • • Like RAID 5 but doesn’t fail as easily Can survive two disks failing Don’t make your arrays too big • • 12 disks = 12x failure rate of one disk! Always keep a hot-spare if you can
Slide 145: Hardware or software RAID? • • Hardware RAID: Worth it for the Battery Backup Unit! • • • • Battery allows the controller to – safely – fake “Sure mister, it’s safely on disk” responses No Battery? Use Software RAID Low or no CPU use Easier and faster to recover from failures! • • Write-intent bitmap More flexible layout options RAID 1 partition for system + RAID 10 for data on each disk
Slide 147: nagios • • • • Monitoring “is the website up” is easy Monitoring dozens or hundreds of sub-systems is hard Monitor everything! Disk usage, system daemons, applications daemons, databases, data states, ...
Slide 148: nagios configuration tricks • • nagios configuration is famously painful Somewhat undeserved! examples of simple configuration - templates - groups
Slide 149: nagios best practices • • • • All alerts must be “important” – if some alerts are ignored, all other alerts easily are, too. Don’t get 1000 alerts if a DB server is down Don’t get paged if 1 of 50 webservers crashed Why do you as a non-sysadmin care? • • Use nagios to help the sysadmins fix the application Get information to improve reliability
Slide 150: Resource management • • If possible, only run one service per server (makes monitoring/ managing your capacity much easier) Balance how you use the hardware • • • • Use memory to save CPU or IO Balance your resource use (CPU vs RAM vs IO) Extra memory on the app server? Run memcached! Extra CPU + memory? Run an application server in a Xen box! • Don’t swap memory to disk. Ever.

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location