How to avoid hanging yourself with Rails

Presentation given to Toronto Rails Project Night, performance tips for ActiveRecord usage

  • These slides are part of a presentation I gave, so you don't really have the context to go with each slide - sorry about that. Glad they helped out though.
  • It's really interesting. I've had problems with the speed of a rails app and after reading this presentation I really improved it. Thanks. But ... the begining is a little confusing.
  1. work.rowanhick.com How to avoid hanging yourself with Rails Using ActiveRecord right the first time 1
  2. Discussion tonight • Intended for new Rails Developers • People that think Rails is slow • Focus on simple steps to improve common :has_many performance problems • Short - 15mins • All links/references up on http://work.rowanhick.com tomorrow 2
  3. About me • New Zealander (not Australian) • Product Development Mgr for a startup in Toronto • Full time with Rails for 2 years • Previously PHP/MySQL for 4 years • 6 years Prior QA/BA/PM for Enterprise CAD/ CAM software dev company 3
  4. Disclaimer • For sake of brevity and understanding, the SQL shown here is cut down to “psuedo sql” • This is not an exhaustive in-depth analysis, just meant as a heads up • Times were done using ApacheBench through mongrel in production mode • ab -n 1000 4
  5. ActiveRecord lets you get in trouble far to quick. • Super easy syntax comes at a cost. @orders = Order.find(:all) @orders.each do |order| puts order.customer.name puts order.customer.country.name end ✴Congratulations, you just overloaded your DB with (total number of Orders x 2) unnecessary SQL calls 5
  6. What happened there? • One query to get the orders @orders = Order.find(:all) “SELECT * FROM orders” • For every item in the orders collection customer.name: “SELECT * FROM customers WHERE id = x” customer.country.name: “SELECT * FROM customers WHERE id = y” 6
  7. Systemic Problem in Web development I’ve seen: - 15 Second page reloads - 10000 queries per page “<insert name here> language performs really poorly, we’re going to get it redeveloped in <insert new language here>” 7
  8. Atypical root cause • Failure to build application with *real* data • ie “It worked fine on my machine” but the developer never loaded up 100’000 records to see what would happen • Using Rake tasks to build realistic data sets • Test, test, test • tail -f log/development.log 8
  9. Faker to the rescue • in lib/xchain.rake namespace :xchain do desc quot;Load fake customersquot; task :load_customers => :environment do require 'Faker' Customer.find(:all, :conditions => quot;email LIKE('%XCHAIN_ %')quot;).each { |c| c.destroy } 1..300.times do c = Customer.new c.status_id = rand(3) + 1 c.country_id = rand(243) + 1 c.name = Faker::Company.name c.alternate_name = Faker::Company.name c.phone = Faker::PhoneNumber.phone_number c.email = quot;XCHAIN_quot;+Faker::Internet.email c.save end end $ rake xchain:load_customers 9
  10. Eager loading • By using :include in .finds you create sql joins • Pull all required records in one query find(:all, :include => [ :customer, :order_lines ]) ✓ order.customer, order.order_lines find(:all, :include => [ { :customer => :country }, :order_lines ]) ✓ order.customer order.customer.country order.order_lines 10
  11. Improvement • Let’s start optimising ... @orders = Order.find(:all, :include => {:customers => :country} ) • Resulting SQL ... “SELECT orders.*, countries.* FROM orders LEFT JOIN customers ON ( customers.id = orders.customers_id ) LEFT JOIN countries ON ( countries.id = customers.country_id) ✓ 7.70 req/s 1.4x faster 11
  12. Select only what you need • Using the :select parameter in the find options, you can limit the columns you are requesting back from the database • No point grabbing all columns, if you only want :id and :name Orders.find(:all, :select => ‘orders.id, orders.name’) 12
  13. The last slide was very important • Not using selects is *okay* provided you have very small columns, and never any binary, or large text data • You can suddenly saturate your DB connection. • Imagine our Orders table had an Invoice column on it storing a pdf of the invoice... 13
  14. Oops • Can’t show a benchmark • :select and :include don’t work together !, reverts back to selecting all columns • Core team for a long time have not included patches to make it work • One little sentence in ActiveRecord rdoc “Because eager loading generates the SELECT statement too, the :select option is ignored.” 14
  15. ‘mrj’ to the rescue • http://dev.rubyonrails.org/attachment/ticket/ 7147/init.5.rb • Monkey patch to fix select/include problem • Produces much more efficient SQL 15
  16. Updated finder • Now :select and :include playing nice: @orders = Order.find(:all, :select => 'orders.id, orders.created_at, customers.name, countries.name, order_statuses.name', :include => [{:customer[:name] => :country[:name]}, :order_status[:name]], :conditions => conditions, :order => 'order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC') ✓15.15 req/s 2.88x faster 16
  17. r8672 change • http://blog.codefront.net/2008/01/30/living-on-the- edge-of-rails-5-better-eager-loading-and-more/ • The following uses new improved association load (12 req/s) @orders = Order.find(:all, :include => [{:customer => :country}, :order_status] ) • The following does not @orders = Order.find(:all, :include => [{:customer => :country}, :order_status], :order => ‘order_statuses.sort_order’ ) 17
  18. r8672 output... • Here’s the SQL Order Load (0.000837) SELECT * FROM `orders` WHERE (order_status_id < 100) LIMIT 10 Customer Load (0.000439) SELECT * FROM `customers` WHERE (customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017)) Country Load (0.000324) SELECT * FROM `countries` WHERE (countries.id IN (33,17,56,150,194,90,91,113,80,54)) OrderStatus Load (0.000291) SELECT * FROM `order_statuses` WHERE (order_statuses.id IN (10)) 18
  19. But I want more • Okay, this still isn’t blazing fast. I’m building the next killr web2.0 app • Forgetabout associations, just load it via SQL, depending on application, makes a huge difference • Concentrate on commonly used pages 19
  20. Catch 22 • Hard coding SQL is the fastest solution • No construction of SQL, no generation of ActiveRecord associated classes • If your DB changes, you have to update SQL ‣ Keep SQL with models where possible 20
  21. It ain’t pretty.. but it’s fast • Find by SQL class order def self.find_current_orders find_by_sql(quot;SELECT orders.id, orders.created_at, customers.name as customer_name, countries.name as country_name, order_statuses.name as status_name FROM orders LEFT OUTER JOIN `customers` ON `customers`.id = `orders`.customer_id LEFT OUTER JOIN `countries` ON `countries`.id = `customers`.country_id LEFT OUTER JOIN `order_statuses` ON `order_statuses`.id = `orders`.order_status_id WHERE order_status_id < 100 ORDER BY order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESCquot;) end end • 28.90 req/s ( 5.49x faster ) 21
  22. And the results find(:all) 5.26 req/s find(:all, :include) 7.70 req/s 1.4x find(:all, :select, :in 15.15 req/s 2.88x clude) find_by_sql() 28.90 req/s 5.49x 22
  23. Don’t forget indexes • 64000 orders OrderStatus.find(:all).each { |os| puts os.orders.count } • Avg 0.61 req/s no indexes • EXPLAIN your SQL ALTER TABLE `xchain_test`.`orders` ADD INDEX order_status_idx(`order_status_id`); • Avg 23 req/s after index (37x improvment) 23
  24. Avoid .count • It’s damned slow OrderStatus.find(:all).each { |os| puts os.orders.count } • Add column orders_count + update code OrderStatus.find(:all).each { |os| puts os.orders_count } ✓34 req/s vs 108 req/s (3x faster) 24
  25. For the speed freaks • Merb - http://merbivore.com • 38.56 req/s - 7x performance improvement • Nearly identical code • Blazingly fast 25
  26. work.rowanhick.com The End 26