2. Hello everybody
Julien PAULI
SensioLabs Blackfire team
Programming with PHP since early 2000s
Today working as Unix system programmer (C)
PHP Internals programmer/contributor
PHP 5.5 & 5.6 Release Manager
@julienpauli
Tech blog at http://jpauli.github.io
jpauli@php.net
3. What we'll cover together
Profiling a simple SF2 app
Under PHP 5
Under PHP 7
Compare profiles using Blackfire graph comparison
Analyze numbers
Dive into PHP 7 performances
Structures optimizations
New variable model (zval)
New HashTable model
String management with zend_string
Userland ideas for your code
4. Profiles
Done on my laptop (not on prod env)
LP64
Done on app_dev.php (debug mode)
Do not take numbers for real
But relative measures
Performed with Blackfire
On PHP-5.6 latest to date
On PHP-7.0 latest to date
5. Blackfire
General profiler
Not only PHP, but works best for PHP
Free version exists
Collects many metrics
memory, CPU, IO, Network trafic, SQL ...
Graphs useful info, trashes useless info
Immediately spot your perf problems
Nice graph comparison view
6. Blackfire collector
Collector is a PHP extension
~ 5000 C lines
Available for 5.3, 5.4, 5.5, 5.6 and 7.0
Available on most Unixes and Windows platforms
Collector impact is NULL if not triggered
Collector works in prod environment
It is highly optimized for performances
It is finely optimized for each PHP version
But it will have an impact while profiling
13. PHP 7 new compiler
PHP 7 compiler is now based on an AST
It has been fully rewriten, and can compute much
more things at compile time
Every hash of every litteral string f.e
Resolves every static/litteral expression
Optimizes some function calls when result is known
at compile time
defined(), strlen(), cufa(), is_{type}()
Don't use namespaced calls but native_calls()
17. PHP 7 new compiler
PHP 7 compiler may take more time than PHP 5's
It optimizes more things
It must walk an AST
Use OPCache to not suffer from compile time
18. PHP 7 compiler optim example, static arrays
Arrays containg keys/vals that are static/litteral
Such arrays are fully resolved at compile time
They involve 0 runtime work
const FOO = ['bar', 'baz', 'foo', 34, [42, 'bar'=>'baz']];
19. Static arrays in PHP 5
A lot of runtime is eaten to construct the same
array again and again
$a = ['bar', 'baz', 'foo', 34, [42, 'bar'=>'baz']];
3 0 E > INIT_ARRAY ~0 'bar'
1 ADD_ARRAY_ELEMENT ~0 'baz'
2 ADD_ARRAY_ELEMENT ~0 'foo'
3 ADD_ARRAY_ELEMENT ~0 34
4 INIT_ARRAY ~1 42
5 ADD_ARRAY_ELEMENT ~1 'baz', 'bar'
6 ADD_ARRAY_ELEMENT ~0 ~1
7 ASSIGN !0, ~0
20. Static arrays in PHP 7
No runtime impact (but compile-time)
You'd better use OPCache
$a = ['bar', 'baz', 'foo', 34, [42, 'bar'=>'baz']];
L3 #0 ASSIGN $a array(5)
22. Packed arrays
If your keys are integer only (no string key)
If your keys are constantly increasing
No matter if they don't follow each other with +1
Then you'll benefit from packed arrays optimization
Packed arrays will reduce memory size compared
to "normal" array
Reduction of (table_size - 2) * 4 bytes
~ 4Kb for a 1000 entry table
May be noticeable for BIG arrays
23. Packed arrays example
const N = 1024 * 1023;
for ($i=0; $i<N; $i++) {
$tab[] = random_bytes(3);
}
echo memory_get_usage();
const N = 1024 * 1023;
for ($i=0; $i<N; $i++) {
$tab[] = random_bytes(3);
}
$tab['foo'] = 'bar';
echo memory_get_usage();
const N = 1024 * 1023;
for ($i=0; $i<N; $i++) {
$tab[] = random_bytes(3);
}
unset($tab[1000]);
$tab[1000] = 1000;
echo memory_get_usage();
~67Mb
~71Mb
~71Mb
24. Packed arrays conditions (recalled)
Do NOT use string keys
Always use increasing integer-based keys
Contiguous or not is not important
For example, if you need lists , then you'll benefit
from this optimisation
26. Optimizing CPU time
Latency Numbers Every Programmer Should Know
http://lwn.net/Articles/250967/
http://www.eecs.berkeley.edu/~rcs/research/interactive_l
atency.html
2016 numbers (may vary with chip)
---------------------------------------------------
L1 cache reference 1 ns
Branch mispredict 3 ns
L2 cache reference 4 ns 4x L1 cache
L3 cache reference 12 ns 3X L2 cache, 12x L1 cache
Main memory reference 100 ns 25x L2 cache, 100x L1 cache
SSD random read 16,000 ns
HDD random read(seek) 200,000,000 ns
27. Optimizing CPU cache efficiency
If we can reduce payload size, the CPU will use its
caches more often
CPU caches prefetch data on a "line" basis
Improve data locality to improve cache efficiency
https://software.intel.com/en-us/articles/optimize-data-
structures-and-memory-access-patterns-to-improve-
data-locality
That means in C
Reduce number of pointer indirections
Stick data together (struct hacks, struct merges)
Use smaller data sizes
28. PHP 7 cache efficiency
If we can reduce payload size, the CPU will use its
caches more often
PHP 7.0.7-dev (debug)
3048,519220 task-clock
299 context-switches
29 CPU-migrations
7 468 page-faults
7 982 214 752 cycles
9 662 629 105 instructions
1 615 685 619 branches
35 078 036 branch-misses
3,055671598 seconds time elapsed
PHP 5.6.22-dev (debug)
5362,520255 task-clock
889 context-switches
79 CPU-migrations
30 270 page-faults
14 141 874 423 cycles
16 596 383 001 instructions
2 641 376 889 branches
60 273 407 branch-misses
5,418137861 seconds time elapsed
29. PHP 7 optimizations
Every variable in PHP is coded on a zval struct
This struct has been reorganized in PHP 7
Narrowed / shrinked
separated
30. PHP 5 variables
value
refcount is_ref
type
gc_info
dval
str_val* str_len
hashtable*
object*
lval
ast*
zval
zval_value
...
...
HashTable
32 bytes
$a
8 bytes
zval *
XX bytes
40 bytes + complex value size
2 indirections
32. PHP 5 vs PHP 7 variable design
zval container no longer stores GC infos
No more need to heap allocate a zval *
Very less pressure on the heap allocator
GC infos stored into each complex types
each complex type may now be shared
In PHP 5, we had to share the zval containing them
PHP 7 variables are much more CPU cache efficient
33. New Memory Allocator
PHP 7 has a fully new heap memory allocator
Zend Memory Manager
It now uses several allocator pools
Huge
Medium
Small
... for better efficiency
Uses mmap(), no more libc's malloc() overhead
May use Kernel Huge Pages if told to
Better CPU TLB usage
34. Hashtables (arrays)
In PHP, HashTables are used to represent the PHP
array type
But HashTables are also used internally
Everywhere
HashTables optimization in PHP 7 are well felt as
they are heavilly used internally
35. HashTables in PHP 5
Each element needs
4 pointer indirections
72 bytes for a bucket + 32 bytes for a zval
zval
zval *
HashTable
$a
zval *
HashTable*
bucket *
zval
64 bytes
72 bytesbucket
36. HashTables in PHP 7
Each element needs
2 pointer indirections
32 bytes for a bucket
zval
bucket
HashTable
$a
zval
HashTable*
zval
56 bytes
32 bytes
bucket*
37. PHP 7 Hash
Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
nIndex
38. PHP 7 Hash
Memory layout is as contiguous as possible
hash"foo" 1234 | (-table_size) -3
buckets*
arData
-1-2-3
2 X XX
-4 1 2
nIndex
nIndex idx
idx
hash
key
zval
bucket
zval
zvalnext
zval
42. String management
In PHP 5, strings don't have their own structure
String management is hard
Leads to many strings duplication
And thus many memory access
In PHP 7, strings share the zend_string structure
They are refcounted, thus shareable
hashes are precomputed, often at compile time
struct hack is used to compact memory
43. Strings in PHP
char * str
...
zval
gc_infos
int len
refcount is_ref zend_string *
...
zval
...
hash
gc_infos
char str[1]size_t len
...
zend_string
PHP 5 PHP 7
51. Other VM operation comparisons
@ usage (error suppression)
PHP 7
PHP 5
52. Encapsed string optimisation
Encapsed string are double-quoted strings that get
parsed
They need to be analyzed for variables
PHP 5 used to reallocate the string at each step
$a = "foo and $b and $c";
3 0 E > ADD_STRING ~0 'foo+and+'
1 ADD_VAR ~0 ~0, !1
2 ADD_STRING ~0 ~0, '+and+'
3 ADD_VAR ~0 ~0, !2
4 ASSIGN !0, ~0
4 5 > RETURN 1
53. Encapsed string in PHP 5
$a = "foo and $b and $c";
3 0 E > ADD_STRING ~0 'foo+and+'
1 ADD_VAR ~0 ~0, !1
2 ADD_STRING ~0 ~0, '+and+'
3 ADD_VAR ~0 ~0, !2
4 ASSIGN !0, ~0
4 5 > RETURN 1
foo and
foo and b
foo and b and
foo and b and c
Lot of pressure on the allocator
Needs to find new chunk
At every new allocation
Browses through a free-chunk
linked-list
Bad for performances
$b = 'b';
$c = 'c';
54. Encapsed string optimisation in PHP 7
PHP 7 uses a "rope", and only reallocates memory
once, at the end
https://en.wikipedia.org/wiki/Rope_(data_structure)
$a = "foo and $b and $c";
L3 #0 ROPE_INIT "foo and " ~1
L3 #1 ROPE_ADD ~1 $b ~1
L3 #2 ROPE_ADD ~1 " and " ~1
L3 #3 ROPE_END ~1 $c ~0
L3 #4 ASSIGN $a ~0
L3 #5 RETURN 1
55. Encapsed strings in PHP 7
$a = "foo and $b and $c";
L3 #0 ROPE_INIT "foo and " ~1
L3 #1 ROPE_ADD ~1 $b ~1
L3 #2 ROPE_ADD ~1 " and " ~1
L3 #3 ROPE_END ~1 $c ~0
L3 #4 ASSIGN $a ~0
L3 #5 RETURN 1
foo and
foo and b
foo and b and
foo and b and c
foo and b and c
INIT
ADD
ADD
ADD
END
Keep every piece of string
as its own buffer
Stack them
At the end, merge them
as one operation
56. So ?
So you'd better use this :
encapsed strings
Than this :
Concatenations
... in PHP 7 (in PHP 5, both perf will be the same)
$a = "foo and $b and $c";
$a = 'foo and ' . $b . ' and ' . $c;