This document summarizes an introduction to profiling presentation. It discusses using the cProfile module to generate profile data and analyze it using tools like pstats. It also discusses using the results to identify bottlenecks by looking at exclusive time functions or walking down the call graph from inclusive time functions. Common optimizations mentioned include removing unnecessary work, using more efficient algorithms, batching I/O operations, database and SQL tuning, caching, and reducing code complexity.
2. “We should forget about small efficiencies, say
about 97% of the time: premature optimization is
the root of all evil. Yet we should not pass up our
opportunities in that critical 3%. A good
programmer will not be lulled into complacency
by such reasoning, he will be wise to look carefully
at the critical code; but only after that code has
been identified.”
–Donald Knuth
3. “Bottlenecks occur in surprising places, so don't
try to second guess and put in a speed hack until
you have proven that's where the bottleneck is.”
–Rob Pike
4. What will a profiler tell us?
❖ Function execution time!
❖ Memory usage, etc. are possible, but for another day!
❖ More about line profiling later!
❖ Real (wall clock) time!
❖ Inclusive vs exclusive time!
❖ Number of calls, primitive and recursive
11. profile.print_callees('full_clean', 10)
!
List reduced from 1211 to 2 due to restriction <'full_clean'>
!
Function called...
ncalls tottime cumtime
forms.py:260(full_clean) -> 500 0.177 2.855 forms.py:
277(_clean_fields)
500 0.003 0.030 forms.py:298(_clean_form)
500 0.031 2.784 models.py:
393(_post_clean)
base.py:918(full_clean) -> 500 0.001 0.001 base.py:738(clean)
500 0.096 2.399 base.py:952(clean_fields)
12. profile.print_callers('full_clean')
!
List reduced from 1211 to 2 due to restriction <'full_clean'>
!
Function was called by...
ncalls tottime cumtime
forms.py:260(full_clean) <- 500 0.009 5.678 forms.py:117(errors)
base.py:918(full_clean) <- 500 0.005 2.405 models.py:
393(_post_clean)
13. KCacheGrind
!
❖ GUI for viewing profile data!
❖ Run your profile output through pyprof2calltree!
❖ On a Mac, qcachegrind is easier to install
19. Using your results
❖ Bottom up approach!
❖ Start with a large exclusive time sub!
❖ Climb up call graph to find something you can affect!
❖ "We're spending a lot of time in deepcopy(). What's
calling that so much?"!
❖ Might miss higher-level fixes
20. Using your results
❖ Top down approach!
❖ Start with a large inclusive time sub!
❖ Walk down call graph to find something you can
affect!
❖ "We're spending a lot of time in this validate() method.
What's it doing that takes so long?"!
❖ Look for structural changes
21. Line profiling
❖ line_profiler does exist!
❖ Results are not very actionable!
❖ If you get this far, you probably should stop (or refactor
your methods!)
22. Good profiling technique
❖ Create a repeatable benchmark test!
❖ Allows you to measure progress!
❖ Iterations/second!
❖ Time for n iterations
23. What usually helps
❖ Removing unnecessary work!
❖ “We load that config data every time, even when we don’t
use it.”!
❖ Using a more efficient algorithm
24. What usually helps
❖ Batching I/O (disk or net) operations!
❖ Database stuff!
❖ SQL tuning!
❖ Indexes!
❖ Transactions
25. What usually helps
❖ Caching!
❖ Easy to add, hard to live with!
❖ Code complexity!
❖ Invalidation calls!
❖ Dependency tracking!
❖ Business customers care about data freshness