Taylor Lovett is a senior web engineer who studied computer science. Computer science involves the study of computational theory, software, and hardware. It includes topics like algorithms, data structures, graph theory, programming, databases, and computer hardware. Big-O notation is used to describe how efficiently an algorithm solves a problem based on changes to input size. It indicates the worst-case time complexity of an algorithm. Tracking post views in WordPress can cause data race issues if not implemented carefully due to the possibility of concurrent requests updating the view count.
2. My name is Taylor Lovett
- Senior Strategic Web Engineer at 10up
- Core Contributor
- Plugin Author (Safe Redirect Manager)
- Plugin Contributor
- BS in Computer Science from the University
of Maryland, College Park
3. What is Computer Science?
- It can mean a lot of things. It is really the
study of computational theory, computer
software, and hardware.
4. Theory of Computation
- General Mathematics (Calculus, linear
algebra, general computational theory,
statistics)
- Algorithms (a method to solve a problem)
- Data structures (which data structure will
allow us to access our data the quickest?)
- Graph theory
5. Computer Software
- Programming techniques and design patterns
(i.e a singleton class)
- Concurrent design patterns (data races)
- Mobile software development
- Operating system software
- Web development
- Databases
- Networking
- Benchmarking
7. Big-Oh Notation
- "Big O notation is used to classify algorithms by how
they respond (e.g., in their processing time or working
space requirements) to changes in input size." --
Wikipedia
- Very useful to describe how performant your code
may or may not be
- Big-Oh usually describes the upper bound of a
function (worst-case)
8. Big-Oh Notation (cont.)
- Big-Oh notation is concerned with measuring the rate
of growth of the amount of processing that your code
might do on an unknown input size
- In Big-Oh we are only concerned about how a our
code performs as the input size approaches infinity.
Mathematically speaking, this means we only care
about the highest order term:
i.e. O(3n2 + 5n) = O(n2) since as n approaches infinity
the only thing that matters is the n2
10. // $fruits contains a non-empty array of strings
function contains_orange( $fruits = array() ) {
for ( $i = 0; $i < count( $fruits ); $i++ ) {
if ( 'orange' == $fruits[$i] ) return true;
}
return false;
}
Best Case Scenario: Loop executes once,
orange is found, and it returns.
Worst Case Scenario: Loop executes n times
(where n is the number of elements in $fruits)
Performance: contains_orange() is in O(n)
11. Remember!
- With Big-Oh we are only concerned with what
happens in the worst case. Sometimes knowing
what happens in the best case is useful, but we
are mostly worried about the performance hit
our code could take in the worst possible
situation.
12. // $fruits contains a non-empty array of strings. For educational
// purposes, $fruits is guaranteed to have at least one duplicate.
function contains_duplicate_fruit( $fruits = array() ) {
for ( $i = 0; $i < count( $fruits ); $i++ ) {
for ( $z = 0; $z < count( $fruits ); $z++ ) {
if ( $i != $z && $fruits[$z] == $fruits[$i] )
return true;
}
}
return false;
}
What does everyone think?
13. Best Case Scenario: Outer loop executes
once, inner loop executes twice, duplicate is
found, function returns
Worst Case Scenario: Outer loop executes n -
1 times (where n is the size of $fruits), inner
loop executes n times for each outer loop
execution... n * (n -1) = n2 - n
Performance: contains_duplicate_fruit is in
O(n2 - n) = O(n2)
14. An important reminder
- We dropped the (-n) from our final Big-Oh
evaluation because, as n approaches infinity,
n2 dominates and (-n) becomes insignificant.
16. Big-Oh Notation and Databases
- Big-Oh notation is used a lot in conjunction
with SQL operations.
- We've all heard that indexing a column in
MySQL makes search on that column faster.
- But why? What does that actually mean?
17. MySQL Indexes
- An index is a data structure that speeds up
search time for information.
- Without an index, searching for a specific
column value is O(n) because in the worst case
scenario every single row in the table must be
examined.
18. MySQL Indexes
- When a column is indexed, MySQL takes the data
across all of the rows in that column and stores
references to that data in a B-tree (this structure is
used for the majority of index types).
- A B-tree is just what it sounds like: A tree of data that
speeds up search time. The worst case scenario for
the amount of items to be processed in a B-tree is log
n. A log is a mathematical function such that:
n2 > n > log n
http://en.wikipedia.org/wiki/B-tree
19. Post Meta Queries
- The full Big-Oh analysis of a post meta query is
pretty complex because of the join operation and
therefore is outside the scope of this talk.
- For our purposes, searching for posts based on a
meta key is O(n) where n is the number of posts that
have that key.
- Let's frame this in terms of featured posts. Featured
posts refers to the situation where a website needs to
mark select posts as featured and query for them.
20. Featured Posts Solution #1
On post update:
if ( isset( $_POST['meta_box_feature'] ) )
update_post_meta( $post_id, 'featured', 1 );
else
update_post_meta( $post_id, 'featured', 0 );
Query:
$args = array(
'meta_key' => 'featured',
'meta_value' => 1,
);
$featured_posts = new WP_Query( $args );
21. Solution #1 Analysis
- Using this code, every time a post is saved, it will have
post meta attached to it such that 'featured' = 1 or 0. This
will create a ton of unnecessary post meta rows.
- Remember searching for posts based on a meta key is
O(n) where n is the number of posts that have that key.
Therefore saving meta when a post is not featured is not
only unnecessary but will really slow us down. This would
result in O(m) performance where m is the number of
posts!
22. Featured Posts Solution #2
On post update:
if ( isset( $_POST['meta_box_feature'] ) )
update_post_meta( $post_id, 'featured', 1 );
else
delete_post_meta( $post_id, 'featured' );
Query:
$args = array(
'meta_key' => 'featured',
'meta_value' => 1,
);
$featured_posts = new WP_Query( $args );
23. Solution #2 Analysis
- This solution is a major improvement over our first
one. This will result in O(n) search time where n is the
number of featured posts.
- However, we can still do better.
24. Featured Posts Solution #3
Let's create a tag called 'featured' and attach it to all our featured
posts:
On init:
$args = array( ... );
register_taxonomy( 'featured', 'post', $args );
Query:
$args = array(
'post_tag' => 'featured'
);
$featured_posts = new WP_Query( $args );
25. Solution #3 Analysis
- For our purposes, searching for posts based on a tag
is O(log n) since there is an index on the tag id
column.
The full Big-Oh analysis of our tag solution is pretty
complex due to SQL join operations and therefore is
beyond the scope of this talk.
26. Concurrency
- In Computer Science concurrency is a
property describing the event where multiple
computations are executed simultaneously,
sometimes interacting with each other.
27. Concurrency
- With concurrent programming we can, among
other things, force each core in a computer to
process a piece of a larger problem or handle
separate tasks. This is extremely powerful.
- When not properly account for, Concurrency
can sometimes result in unexpected bugs that
are difficult to reproduce.
28. Concurrency in WordPress
- Concurrency takes a slightly different form in
WordPress. We don't solve problems by
starting new threads/processes. However,
since behind the scenes servers can run
multiple processes at the same time and thus
multiple users can execute the same code
simultaneously, issues surrounding
concurrency can arise.
29. Tracking Postviews in WordPress
- A common request in WordPress is to display the
number of views for each post on the frontend.
- There are many different ways to approach this
problem; the most common is to increment an
integer stored in post meta each time a post is
viewed, then to display this number for each post.
- This implementation can lead to data races.
30. Here is the code that executes on
each post request
$views = get_post_meta( $id, 'views', true );
$views++;
update_post_meta( $id, 'views', $views );
31. Data Races
- A data race is the situation where two or more
threads access a shared memory location, at
least one of those accesses is a write, and the
order of the accesses is unknown (meaning
there are no explicit locking mechanisms used).
- Think of each page request as a thread on the
server. If two users request a post at the same
time, a data race for pageviews occurs since
both accesses are writing to the postmeta
table.
32. A Possible Ordering of Events
Code executed for User A is in red and User B in blue
$views = get_post_meta( $id, 'views', true ); // $views = 0
$views++; // $views = 1
update_post_meta( $id, 'views', $views ); // _views = 1
$views = get_post_meta( $id, 'views', true ); // $views = 1
$views++; // $views = 2
update_post_meta( $id, 'views', $views ); // _views = 2
In this ordering of events, $views ends up with a value of 2
which is what we want. However, these events could occur
in any order...
33. Another Ordering of Events
$views = get_post_meta( $id, 'views', true ); // $views = 0
$views = get_post_meta( $id, 'views', true ); // $views = 0
$views++; // $views = 1
$views++; // $views = 1
update_post_meta( $id, 'views', $views ); // _views = 1
update_post_meta( $id, 'views', $views ); // _views = 1
In this ordering of events, $views ends up with a value of 1
which is NOT what we want.
35. Solution to Pageview Problem?
Solution 1: Jetpack plugin. We can install
Jetpack and leverage it's stats API to query
information on specific posts.
Solution 2: Google Analytics. Using a websites
Google Analytics account, we can set custom
variables on a post-to-post basis and query the
API based on those variables.