We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In most cases, we do everything and anything with it but we stumble upon it when profiling code.
During this session, we'll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Mastering PHP Data Structure 102 - phpDay 2012 Verona
1. Mastering PHP Data Structure 102
Patrick Allaert
phpDay 2012 Verona, Italy
2. About me
● Patrick Allaert
● Founder of Libereco
● Playing with PHP/Linux for +10 years
● eZ Publish core developer
● Author of the APM PHP extension
● @patrick_allaert
● patrickallaert@php.net
● http://github.com/patrickallaert/
● http://patrickallaert.blogspot.com/
16. Array: PHP's untruthfulness
PHP “Arrays” are not true Arrays!
An array typically looks like this:
0 1 2 3 4 5
Data Data Data Data Data Data
17. Array: PHP's untruthfulness
PHP “Arrays” can dynamically grow and be iterated
both directions (reset(), next(), prev(), end()),
exclusively with O(1) operations.
18. Array: PHP's untruthfulness
PHP “Arrays” can dynamically grow and be iterated
both directions (reset(), next(), prev(), end()),
exclusively with O(1) operations.
Let's have a Doubly Linked List (DLL):
Head Tail
Data Data Data Data Data
Enables List, Deque, Queue and Stack
implementations
20. Array: PHP's untruthfulness
PHP “Arrays” elements are always accessible using a
key (index).
Let's have an Hash Table:
Head Bucket pointers array Tail
0 1 2 3 4 5 ... nTableSize -1
Bucket * Bucket * Bucket * Bucket * Bucket * Bucket * Bucket *
Bucket Bucket Bucket Bucket Bucket
Data Data Data Data Data
23. Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8
bytes) can be stored in 0.76 Mb.
● In PHP: it will take ≅ 13.97 Mb!
● A PHP variable (containing an integer) takes 48
bytes.
● The overhead of buckets for every “array” entries is
about 96 bytes.
● More details:
http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
26. Structs (or records, tuples,...)
● A struct is a value containing other values which
are typically accessed using a name.
● Example:
Person => firstName / lastName
ComplexNumber => realPart / imaginaryPart
28. Structs – Using a class
$person = new PersonStruct(
"Patrick", "Allaert"
);
29. Structs – Using a class
(Implementation)
class PersonStruct
{
public $firstName;
public $lastName;
public function __construct($firstName, $lastName)
{
$this->firstName = $firstName;
$this->lastName = $lastName;
}
}
30. Structs – Using a class
(Implementation)
class PersonStruct
{
public $firstName;
public $lastName;
public function __construct($firstName, $lastName)
{
$this->firstName = $firstName;
$this->lastName = $lastName;
}
public function __set($key, $value)
{
// a. Do nothing
// b. trigger_error()
// c. Throws an exception
}
}
31. Structs – Pros and Cons
Array Class
+ Uses less memory (PHP < 5.4) - Uses more memory (PHP < 5.4)
- Uses more memory (PHP = 5.4) + Uses less memory (PHP = 5.4)
- No type hinting + Type hinting possible
- Flexible structure + Rigid structure
+|- Less OO +|- More OO
Slightly faster? Slightly slower?
33. (true) Arrays
● An array is a fixed size collection where elements
are each identified by a numeric index.
34. (true) Arrays
● An array is a fixed size collection where elements
are each identified by a numeric index.
0 1 2 3 4 5
Data Data Data Data Data Data
35. (true) Arrays – Using SplFixedArray
$array = new SplFixedArray(3);
$array[0] = 1; // or $array->offsetSet()
$array[1] = 2; // or $array->offsetSet()
$array[2] = 3; // or $array->offsetSet()
$array[0]; // gives 1
$array[1]; // gives 2
$array[2]; // gives 3
36. (true) Arrays – Pros and Cons
Array SplFixedArray
- Uses more memory + Uses less memory
+|- Less OO +|- More OO
38. Queues
● A queue is an ordered collection respecting First
In, First Out (FIFO) order.
● Elements are inserted at one end and removed at
the other.
39. Queues
● A queue is an ordered collection respecting First
In, First Out (FIFO) order.
● Elements are inserted at one end and removed at
the other.
Data
Dequeue
Data Data Data Data Data Data
Enqueue
Data
40. Queues – Using array
$queue = array();
$queue[] = 1; // or array_push()
$queue[] = 2; // or array_push()
$queue[] = 3; // or array_push()
array_shift($queue); // gives 1
array_shift($queue); // gives 2
array_shift($queue); // gives 3
41. Queues – Using SplQueue
$queue = new SplQueue();
$queue[] = 1; // or $queue->enqueue()
$queue[] = 2; // or $queue->enqueue()
$queue[] = 3; // or $queue->enqueue()
$queue->dequeue(); // gives 1
$queue->dequeue(); // gives 2
$queue->dequeue(); // gives 3
43. Stacks
● A stack is an ordered collection respecting Last In,
First Out (LIFO) order.
● Elements are inserted and removed on the same
end.
44. Stacks
● A stack is an ordered collection respecting Last In,
First Out (LIFO) order.
● Elements are inserted and removed on the same
end.
Data
Push
Data Data Data Data Data Data
Pop
Data
45. Stacks – Using array
$stack = array();
$stack[] = 1; // or array_push()
$stack[] = 2; // or array_push()
$stack[] = 3; // or array_push()
array_pop($stack); // gives 3
array_pop($stack); // gives 2
array_pop($stack); // gives 1
46. Stacks – Using SplStack
$stack = new SplStack();
$stack[] = 1; // or $stack->push()
$stack[] = 2; // or $stack->push()
$stack[] = 3; // or $stack->push()
$stack->pop(); // gives 3
$stack->pop(); // gives 2
$stack->pop(); // gives 1
47. Queues/Stacks – Pros and Cons
Array SplQueue / SplStack
- Uses more memory + Uses less memory
(overhead / entry: 96 bytes) (overhead / entry: 48 bytes)
- No type hinting + Type hinting possible
+|- Less OO +|- More OO
48. Sets
Geeks Nerds
People with
strong views on
the distinction
between geeks
and nerds
49. Sets
● A set is a collection with no particular ordering
especially suited for testing the membership of a
value against a collection or to perform
union/intersection/complement operations
between them.
50. Sets
● A set is a collection with no particular ordering
especially suited for testing the membership of a
value against a collection or to perform
union/intersection/complement operations
between them.
Data
Data
Data
Data
Data
51. Sets – Using array
$set = array();
// Adding elements to a set
$set[] = 1;
$set[] = 2;
$set[] = 3;
// Checking presence in a set
in_array(2, $set); // true
in_array(5, $set); // false
array_merge($set1, $set2); // union
array_intersect($set1, $set2); // intersection
array_diff($set1, $set2); // complement
52. Sets – Using array
$set = array();
// Adding elements to a set
$set[] = 1;
$set[] = 2;
$set[] = 3; True
// Checking presence in a set performance
in_array(2, $set); // true
in_array(5, $set); // false
killers!
array_merge($set1, $set2); // union
array_intersect($set1, $set2); // intersection
array_diff($set1, $set2); // complement
56. Sets – Using array (simple types)
$set = array();
// Adding elements to a set
$set[1] = true; // Any dummy value
$set[2] = true; // is good but NULL!
$set[3] = true;
// Checking presence in a set
isset($set[2]); // true
isset($set[5]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
57. Sets – Using array (simple types)
$set = array();
// Adding elements to a set
$set[1] = true; // Any dummy value
$set[2] = true; // is good but NULL!
$set[3] = true;
// Checking presence in a set
isset($set[2]); // true
isset($set[5]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
● Remember that PHP Array keys can be integers or
strings only!
58. Sets – Using array (objects)
$set = array();
// Adding elements to a set
$set[spl_object_hash($object1)] = $object1;
$set[spl_object_hash($object2)] = $object2;
$set[spl_object_hash($object3)] = $object3;
// Checking presence in a set
isset($set[spl_object_hash($object2)]); // true
isset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
59. Sets – Using array (objects)
$set = array();
// Adding elements to a set
$set[spl_object_hash($object1)] = $object1; Store a
$set[spl_object_hash($object2)] = $object2; reference of
$set[spl_object_hash($object3)] = $object3; the object!
// Checking presence in a set
isset($set[spl_object_hash($object2)]); // true
isset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
60. Sets – Using SplObjectStorage
(objects)
$set = new SplObjectStorage();
// Adding elements to a set
$set->attach($object1); // or $set[$object1] = null;
$set->attach($object2); // or $set[$object2] = null;
$set->attach($object3); // or $set[$object3] = null;
// Checking presence in a set
isset($set[$object2]); // true
isset($set[$object2]); // false
$set1->addAll($set2); // union
$set1->removeAllExcept($set2); // intersection
$set1->removeAll($set2); // complement
61. Sets – Using QuickHash (int)
$set = new QuickHashIntSet(64,
QuickHashIntSet::CHECK_FOR_DUPES);
// Adding elements to a set
$set->add(1);
$set->add(2);
$set->add(3);
// Checking presence in a set
$set->exists(2); // true
$set->exists(5); // false
// Soonish: isset($set[2]);
● No union/intersection/complement operations
(yet?)
● Yummy features like (loadFrom|saveTo)(String|File)
62. Sets – Using bitsets
define("E_ERROR", 1); // or 1<<0
define("E_WARNING", 2); // or 1<<1
define("E_PARSE", 4); // or 1<<2
define("E_NOTICE", 8); // or 1<<3
// Adding elements to a set
$set = 0;
$set |= E_ERROR;
$set |= E_WARNING;
$set |= E_PARSE;
// Checking presence in a set
$set & E_ERROR; // true
$set & E_NOTICE; // false
$set1 | $set2; // union
$set1 & $set2; // intersection
$set1 ^ $set2; // complement
63. Sets – Using bitsets (example)
Instead of:
function remove($path, $files = true, $directories = true, $links = true,
$executable = true)
{
if (!$files && is_file($path))
return false;
if (!$directories && is_dir($path))
return false;
if (!$links && is_link($path))
return false;
if (!$executable && is_executable($path))
return false;
// ...
}
remove("/tmp/removeMe", true, false, true, false); // WTF ?!
64. Sets – Using bitsets (example)
Instead of:
define("REMOVE_FILES", 1 << 0);
define("REMOVE_DIRS", 1 << 1);
define("REMOVE_LINKS", 1 << 2);
define("REMOVE_EXEC", 1 << 3);
define("REMOVE_ALL", ~0); // Setting all bits
function remove($path, $options = REMOVE_ALL)
{
if (~$options & REMOVE_FILES && is_file($path))
return false;
if (~$options & REMOVE_DIRS && is_dir($path))
return false;
if (~$options & REMOVE_LINKS && is_link($path))
return false;
if (~$options & REMOVE_EXEC && is_executable($path))
return false;
// ...
}
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS); // Much better :)
65. Sets: Conclusions
● Use the key and not the value when using PHP
Arrays.
● Use QuickHash for set of integers if possible.
● Use SplObjectStorage as soon as you are playing
with objects.
● Don't use array_unique() when you need a set!
66. Maps
● A map is a collection of key/value pairs where all
keys are unique.
67. Maps – Using array
$map = array();
$map["ONE"] = 1;
$map["TWO"] = 2;
$map["THREE"] = 3;
// Merging maps:
array_merge($map1, $map2); // SLOW!
$map2 + $map1; // Fast :)
● Don't use array_merge() on maps.
72. Heap – Using Spl(Min|Max)Heap
$heap = new SplMinHeap;
$heap->insert(3);
$heap->insert(1);
$heap->insert(2);
73. Heaps: Conclusions
● MUCH faster than having to re-sort() an array at
every insertion.
● If you don't require a collection to be sorted at
every single step and can insert all data at once
and then sort(). Array is a much better/faster
approach.
● SplPriorityQueue is very similar, consider it is the
same as SplHeap but where the sorting is made on
the key rather than the value.
74. Bloom filters
● A bloom filter is a space-efficient probabilistic data
structure used to test whether an element is
member of a set.
● False positives are possible, but false negatives are
not!
75. Bloom filters – Using bloomy
// BloomFilter::__construct(int capacity [, double
error_rate [, int random_seed ] ])
$bloomFilter = new BloomFilter(10000, 0.001);
$bloomFilter->add("An element");
$bloomFilter->has("An element"); // true for sure
$bloomFilter->has("Foo"); // false, most probably
76. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
77. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation
http://pecl.php.net/package/Judy
78. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation
http://pecl.php.net/package/Judy
● Weakref: Weak references implementation.
Provides a gateway to an object without
preventing that object from being collected by the
garbage collector.
79. Conclusions
● Use appropriate data structure. It will keep your
code clean and fast.
80. Conclusions
● Use appropriate data structure. It will keep your
code clean and fast.
● Think about the time and space complexity
involved by your algorithms.
81. Conclusions
● Use appropriate data structure. It will keep your
code clean and fast.
● Think about the time and space complexity
involved by your algorithms.
● Name your variables accordingly: use “Map”, “Set”,
“List”, “Queue”,... to describe them instead of using
something like: $ordersArray.