Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
Notes about this presentation This presentation was part of the FrOSCon 2011 program. It was designed to presented live and as a result many of the slides may seem odd without spoken explanation. The live benchmarks at the conference are ofcourse also not part of these slides.
Who am I ? Wim Godden (@wimgtr)
Owner of Cu.be Solutions (http://cu.be)
PHP developer since 1997
Developer of OpenX
Zend Certified Engineer
Zend Framework Certified Engineer
MySQL Certified Developer
Who are you ? Developers ?
System/network engineers ?
Managers ?
Caching experience ?
Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
Goals of this tutorial Everything about caching and tuning
A few techniques How-to
How-NOT-to -> Increase reliability, performance and scalability
5 visitors/day -> 5 million visitors/day
(Don't expect miracle cure !)
LAMP
LAMP
Architecture
Our test site
Our base benchmark Apachebench = useful enough
Result ?
Caching
What is caching ?
What is caching ? select * from article join user on article.user_id = user.id order by created desc limit 10
Caching goals Source of information (db, file, webservice, …) : Reduce # of request
Reduce the load Latency : Reduce for visitor
Reduce for Webserver load Network : Send less data to visitor
Hey, that's frontend !
Theory of caching DB
Theory of caching DB
Theory of caching if ($data == false) DB
Caching techniques #1 : Store entire pages Company Websites
Blogs
Full pages that don't change
Render -> Store in cache -> retrieve from cache
Caching techniques #1 : Store entire pages
Caching techniques #2 : Store parts of a page Most common technique
Usually a small block in a page
Best effect : reused on lots of pages
Caching techniques #2 : Store parts of a page
Caching techniques #3 : Store SQL queries ↔ SQL query cache Limited in size
Caching techniques #3 : Store SQL queries ↔ SQL query cache Limited in size
Resets on every insert/update/delete
Server and connection overhead Goal : not  to get rid of DB
free up DB resources for more hits !
Caching techniques #3 : Store SQL queries
Caching techniques #4 : Store complex processing results Not just calculations
CPU intensive tasks : Config file parsing
XML file parsing
Loading CSV in an array Save resources -> more resources available
Caching techniques #4 : Store complex processing results
Caching techniques #xx : Your call Only limited by your imagination ! When you have data, think : Creating time ?
Modification frequency ?
Retrieval frequency ?
How to find cacheable data New projects : start from 'cache everything'
Existing projects : Look at MySQL slow query log
Make a complete query log (don't forget to turn it off !)
Check page loading times
Caching storage - MySQL query cache Use it
Don't rely on it
Good if you have : lots of reads
few different queries Bad if you have : lots of insert/update/delete
lots of different queries
Caching storage - Disk Data with few updates : good
Caching SQL queries : preferably not
DON'T  use NFS or other network file systems especially for sessions
high latency
locking issues !
Caching storage - Disk / ramdisk Overhead : filesystem access
Limited number of files per directory -> Subdirectories Local 5 Webservers -> 5 local caches
-> Hard to scale
How will you keep them synchronized ? -> Don't say NFS or rsync !
Caching storage - Memcache Facebook, Twitter, Slashdot, … -> need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system Keys - max. 250bytes
Values - max. 1Mbyte
Caching storage - Memcache Facebook, Twitter, Slashdot, … -> need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system Keys - max. 250bytes
Values - max. 1Mbyte Extremely fast... non-blocking, UDP (!)
Memcache - where to install
Memcache - where to install
Memcache - installation & running it Installation Distribution package
PECL
Windows : binaries Running No config-files
memcached -d -m <mem> -l <ip> -p <port>
ex. : memcached -d -m 2048 -l 127.0.0.1 -p 11211
Caching storage - Memcache - some notes Not fault-tolerant It's a cache !
Lose session data
Lose shopping cart data
...
Caching storage - Memcache - some notes Not fault-tolerant It's a cache !
Lose session data
Lose shopping cart data
… Different libraries Original : libmemcache
New : libmemcached (consistent hashing, UDP, binary protocol, …) Firewall your Memcache port !
Memcache in code <?php $memcache =  new  Memcache(); $memcache->addServer( '172.16.0.1' , 11211); $memcache->addServer( '172.16.0.2' , 11211); $myData = $memcache->get( 'myKey' ); if  ($myData ===  false ) { $myData = GetMyDataFromDB(); // Put it in Memcache as 'myKey', without compression, with no expiration $memcache->set( 'myKey' , $myData,  false , 0); } echo  $myData;
Let's give that a go ! /** * Retrieves the 10 highest rated articles *  @return  array List of highest rated articles */ static public function  getTopRatedArticleList () { if  ($articleList = $cache->load( 'topRatedArticleList' ) === false) { $articleList =  self :: getTopRatedArticleListUncached (); $cache->save($articleList,  'topRatedArticleList' ); } return  $articleList; }
Where's the data ? Memcache client decides (!)
2 hashing algorithms : Traditional Server failure -> all data must be rehashed Consistent Server failure -> 1/x of data must be rehashed (x = # of servers) No replication !
Memcache slabs (or why Memcache says it's full when it's not) Multiple slabs of different sizes : Slab 1 : 400 bytes
Slab 2 : 480 bytes (400 * 1.2)
Slab 3 : 576 bytes (480 * 1.2) (and so on...) Multiplier (1.2 here) can be configured
Each larger slab has room for fewer items (chunks)
-> Store a lot of very large objects
-> Large slabs might be full
-> Rest of slabs might be free
-> Try to store more -> eviction of data !
Memcache - Is it working ? Connect to it using telnet &quot;stats&quot; command ->
Use Cacti or other monitoring tools STAT pid 2941 STAT uptime 10878 STAT time 1296074240 STAT version 1.4.5 STAT pointer_size 64 STAT rusage_user 20.089945 STAT rusage_system 58.499106 STAT curr_connections 16 STAT total_connections 276950 STAT connection_structures 96 STAT cmd_get 276931 STAT cmd_set 584148 STAT cmd_flush 0 STAT get_hits 211106 STAT get_misses 65825 STAT delete_misses 101 STAT delete_hits 276829 STAT incr_misses 0 STAT incr_hits 0 STAT decr_misses 0 STAT decr_hits 0 STAT cas_misses 0 STAT cas_hits 0 STAT cas_badval 0 STAT auth_cmds 0 STAT auth_errors 0 STAT bytes_read 613193860 STAT bytes_written 553991373 STAT limit_maxbytes 268435456 STAT accepting_conns 1 STAT listen_disabled_num 0 STAT threads 4 STAT conn_yields 0 STAT bytes 20418140 STAT curr_items 65826 STAT total_items 553856 STAT evictions 0 STAT reclaimed 0
Memcache - backing up
Memcache - deleting <?php   $memcache =  new  Memcache(); $memcache->delete( 'myKey' ); $myData = $memcache->get( 'myKey' );  // $myData === false
Memcache - tip Page with multiple blocks ? -> use Memcached::getMulti() Warning : what if you get some hits and some misses ?
Naming your keys Key names must be unique
Prefix / namespace your keys !
Only letters, numbers and underscore
md5() is useful -> BUT : harder to debug Use clear names
Document your key names !
Updating data
Updating data
Adding/updating data $memcache->delete( 'ArticleDetails__Toshiba_32C100U_32_Inch' ); $memcache->delete( 'Homepage_Popular_Product_List' );

Caching and tuning fun for high scalability @ FrOSCon 2011