2. Who I am?
● Author of
– http://livehelperchat.com/
– http://redmine.remdex.info my projects :)
● Currently working
http://www.coralsolutions.com/
– Freelancing and building open-source
software in free time
3. Purpose of the presentation 1
● Present some architecture decisions witch
were applied building image gallery
4. What's new since last presentation
● Mobile devices get support
● Image gallery can be used as shopping CMS
– Credit's based buying
– Checkout using paypal service
● Uncached pages get speed improvement by
finding bug in paginator.
● Official ngnix support
5. What's new since last presentation 2
● Extensions
● Kernel modules override
● Kernel classes override
● CSS compile
● Most popular images in 24 hours
● Photo approvement functionality
● Image filtering by resolution
6. What's new since last presentation 3
● Thumbnails recreation script
● 100% duplicates management accuracy
● More configurable system aspects as:
– Max upload photo size
– Max archive size
– Max file queue size
● Animated gif support
7. What's new since last presentation 4
● Animated gif support
● Completely fixed AJAX navigation usability, no
more confusing of available images to left or to
right.
● Front end design remake, thanks to
http://pauliusc.lt
● HTML output compression
● HTML 5 frontend changes, saves bandwidth
●
8. What's new since last presentation 5
● Some performance improvement regarding users
permissions settings
● More things moved to Memcached service
9. What's new since last presentation 5
V4
● Sort by relevance was introduced
● AddQuery usage implementation in search
● Refactored search page. One query less now.
● Paginator updates
● Sphinx wildcard support
● Images without original deletion script
● SEO enchancement related to resolution and user
current page
10. What's new since last presentation 5
V5
● Refactored captcha, it's now AJAX/javacript
based, performs well, plus saves one request on
image preview window
● Image preview full window cache!!! cached
windows is as fast as cached pagination around
5ms
● Image counter from log file, avoid insert on each
image preview window
11. What's new since last presentation 5
V5
● Mysql query hint for album pagination, mysql
planner choosed wrong indexes
● Smart selects in image preview window
● Full multilanguage support including translatable
module URL!!! none of my known gallery/cms
has this featyre. E.x gallery/search (engish) or
gallerie/recherche (french)
● Full InnoDB support. Performs well as MyISAM.
Top process is PHP not Mysql :)
12. Future works
● Pagination sharding with index filter shard table.
It should boost large sets of pagination around
100% > and keep constant speed with millions
of photos.
● http://remdex.info/Optimising-mysql-limit-performan
● Backend redesign
13. Issues with previous image gallery's I had
● A lot of users = a lot of problems
– No caching support
– Unoptimized SQL query's
– Resource hungry
– No framework used (well, perhaps this is not a problem, but most of the time
they just duplicate frameworks functionality, reinventing the wheel...)
– No Etag based caching, bandwidth saver...
15. Adopted software
● APC – opcode cache for PHP
● Sphinx – free open-source SQL full-text search engine (http://sphinxsearch.com/)
● Memcached – free & open source, high-performance, distributed memory object caching
system
(http://memcached.org/)
● eZ Components – an enterprise-ready, general-purpose PHP library of components used
independently or together for PHP application development.
(http://ez.no/ezcomponents)
● JQuery – is a fast and concise JavaScript Library that simplifies HTML document
traversing, event handling, animating, and Ajax interactions for rapid web development.
(http://jquery.com/)
● Lighttpd – lightweight open-source web server.
(http://www.lighttpd.net/)
● Mysql – database engine
(http://www.mysql.com)
16. Adopted software
● Ngnix - A HTTP and mail proxy server licensed
under a 2-clause BSD-like license.
(http://nginx.org/)
● Fully working ngnix config provided. For eshop
requirements and standard
17. Building process – core
● Gallery core is based on eZ Components. Used
components:
– Authentication
– Configuration
– Database
– Feed
– ImageAnalysis
– ImageConversion
– PersistentObject
– Translation
– Cache
– Url
– UserInput
18. Fulltext search implementation
● Why sphinx?
– Very very fast :)
● Used features of 9.9
– SetSelect – this feature was introduced in 9.9
version and allowed to make fancy filtering.
– Example in next slide
19. Image full mode problem with
previous and next image
● Search condition in literal. I need to find 2 previous
images based on current image position including
search keyword, sorting mode.
●
● URL consists of
– Current image ID (16679)
– Keyword (haposai)
– Sort mode (popular)
● How do I find out what should I display in two first thumbnails (middle image is current our image)?
●
20. Solution
● Use SetSelect query
$cl->SetSelect ( "*, (hits > '.$Image->hits.' OR (hits = '.$Image->hits.' AND pid > '.$Image-
>pid.')) AS myfilter" );
$cl->SetFilter ( "myfilter", array(1) );
● Things I do not know how to do till now. If sorting is based on relevance
how to now previous two images.
● I know now. But:
– SetSelect does not work with @weight attributes in it.
– Had to use two query's. SetFilter() works with @weight
– AddQuery comes in help here for perfromance. Mutch more
relevance images now.
21. Some search statistic
● Each day around 190 K querys. It were more if
search result page were not be cached :)
●
22. Mysql performance tweaking
● Just optimise querys (EXPLAIN is you friend)
● Not a single slow query
● Some tips:
– With large data sets use
– SELECT * FROM `lh_gallery_images`
– INNER JOIN ( SELECT pid FROM lh_gallery_images ORDER BY comtime DESC, pid DESC LIMIT 20 OFFSET 20
) AS items
– ON lh_gallery_images.pid = items.pid
– This query is at least 5x times faster than normal select.
Tested with (150 K records.)
– See - http://www.mysqlperformanceblog.com
23. Supported HTTP servers
● Lighttpd
● Apache
● Ngnix
– With ngnix managed to produce around 1200 Q/S
on cached page. It's 30% more than with
Lighttpd.
24. Caching objects
● Version caching
– http://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached
– http://www.infoq.com/presentations/lutke-rockstar-memcaching
– Version cache were used in
● Album pages
● Last uploaded
● Last hits
● Popular images and so on.
● The most popular images in 24 hours
– Then cache is cleared?
● It's not, only version number is increased, and automatic cache self expire, because cache key does not
exists.
–
25. Some code with version cache
● Cache Key calculation in Album
● $cache = CSCacheAPC::getMem();
$cacheKey = md5('version_'.$cache->getCacheVersion('album_'.(int)$Params['user_parameters']['album_id']).
$mode.'album_view_url'.(int)$Params['user_parameters']['album_id'].'_page_'.$Params['user_parameters_unordered']['page']);
– Includes:
● Album version
● $mode – sorting mode (Ex. Popular)
● Page
this combination gives unique cache version for each page.
● Same logic applies to all listing pages
26. Some benchmarks[root@ks310613 ~]# ab -n 500 -c 10 http://animeonly.org/Fantasy/Mix-16a.html
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking animeonly.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Finished 500 requests
Server Software: lighttpd
Server Hostname: animeonly.org
Server Port: 80
Document Path: /Fantasy/Mix-16a.html
Document Length: 26883 bytes
Concurrency Level: 10
Time taken for tests: 0.545137 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 13593092 bytes
HTML transferred: 13441500 bytes
Requests per second: 917.20 [#/sec] (mean)
Time per request: 10.903 [ms] (mean)
Time per request: 1.090 [ms] (mean, across all concurrent requests)
Transfer rate: 24349.84 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 5 10 2.9 9 23
Waiting: 4 9 3.1 9 23
Total: 5 10 2.9 9 23
Percentage of the requests served within a certain time (ms)
50% 9
66% 12
75% 13
80% 13
90% 13
95% 13
98% 20
27. Etag base caching
● What is it?
– An ETag (entity tag) is part of HTTP, the protocol
for the World Wide Web. It is a response header
that may be returned by an HTTP/1.1 compliant
web server and is used to determine change in
content at a given URL
(http://en.wikipedia.org/wiki/HTTP_ETag)
28. How to use it?
$ExpireTime = 3600;
$currentKeyEtag = md5($cacheKey.'user_id_'.erLhcoreClassUser::instance()->getUserID());;
header('Cache-Control: max-age=' . $ExpireTime); // must-revalidate
header('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');
header('ETag: ' . $currentKeyEtag);
$iftag = isset($_SERVER['HTTP_IF_NONE_MATCH']) ? $_SERVER['HTTP_IF_NONE_MATCH'] ==
$currentKeyEtag : null;
if ($iftag === true)
{
header ("HTTP/1.0 304 Not Modified");
header ('Content-Length: 0');
exit;
}
● $cacheKey – from previous example cache key
● User ID is needed if user is logged in.
● Can be used for custom pages, that do not change
● Then image is uploaded or deleted, we just increase cache version and Etag is expired automatic
also.
●
30. Some MRTG screen shots 2
● Memcached status
●
●
● Traffic stats
●
31. Conclusions
● Single server with sphinx, memcached, mysql,
nginx handles per day around 180 K pageviews
daily.
● No performance issues at this time.
● Gallery home page
http://code.google.com/p/hppg/
32. Thank you for your attention :)
● Questions etc:
– remdex@gmail.com