SlideShare uma empresa Scribd logo
1 de 53
(slightly) smarter
SMART PTRS
www.italiancpp.org
C++ Day 2017
2 Dicembre, Modena
Carlo Pescio
C++ DAY 2017 – UN EVENTO DELL’ITALIAN C++ COMMUNITY
Carlo Pescio http://carlopescio.com http://eptacom.net
Anyone not using smart ptrs?
Carlo Pescio http://carlopescio.com http://eptacom.net
Smart Pointers are cool
- (good) smart pointers enable explicit lifetime design
- Still smart pointers have some run-time overhead
- Talk is focusing on that overhead, not lifetime design
- Specifically, shared pointers
Carlo Pescio http://carlopescio.com http://eptacom.net
Just a proof of concept
- Code lacks support for polymorphic conversion etc.
- I’ve kept this thing in a closet way too long :-)
- Probably the “invention” process is more interesting
than the thing itself.
Carlo Pescio http://carlopescio.com http://eptacom.net
Shared pointers
• Shared ownership of an object
• Last pointer to go destroys the pointed object
• Usually keeps a count of incoming references to obj
• Count must be shared as well
Carlo Pescio http://carlopescio.com http://eptacom.net
Shared pointers in practice
• 3 basic implementations
• (+ an uncommon one)
• C++11 implementation
Carlo Pescio http://carlopescio.com http://eptacom.net
Invasive
Carlo Pescio http://carlopescio.com http://eptacom.net
Slim, maybe slow
Carlo Pescio http://carlopescio.com http://eptacom.net
Fat, maybe fast
Carlo Pescio http://carlopescio.com http://eptacom.net
Standard
T*
cb*
T object
RC
WC
FULL OBJ *
deleter
….
Carlo Pescio http://carlopescio.com http://eptacom.net
Standard + make_shared
T*
cb*
T object
RC
WC
FULL OBJ *
[deleter]
….
+
Carlo Pescio http://carlopescio.com http://eptacom.net
<Interlude>
Rant is coming
Carlo Pescio http://carlopescio.com http://eptacom.net
Why I don’t like shared_ptr
OPINION <<< BEWARE : )
“C was designed on the assumption that the programmer
is someone sensible who knows what he's doing”
(Kernighan and Ritchie)
“What you don't use, you don't pay for” (Stroustrup)
Carlo Pescio http://carlopescio.com http://eptacom.net
shared_ptr: the dark side
You pay the overhead of weak even if you don’t use it
Trying to protect you from absence of virtual destructor
You lose a standalone weak ptr
make_shared is a virus (factory -> unique -> penalty)
Carlo Pescio http://carlopescio.com http://eptacom.net
Safe-by-design weak usage
NOT VALID C++
(just an example)
unique
unique
weak
Carlo Pescio http://carlopescio.com http://eptacom.net
</Interlude>
Sorry for ranting
Carlo Pescio http://carlopescio.com http://eptacom.net
A little benchmark
- Not much science, just a few reasonable tests, should
be expanded, tested with more compilers, etc.
- 3 tests: allocation, dereferencing, “complex” usage.
- Tested raw, std::shared_ptr, std:shared_ptr +
make_shared, a slim and a fat version I implemented.
Carlo Pescio http://carlopescio.com http://eptacom.net
Baseline (raw pointers)
struct Data
{
public:
Data(int v) : n(v)
{
++liveInstances;
}
~Data()
{
--liveInstances;
}
// …
private:
static int liveInstances;
int n;
};
struct Lt
{
bool operator()(const Data* d1, const Data* d2)
{
return d1->Value() < d2->Value();
}
};
Carlo Pescio http://carlopescio.com http://eptacom.net
Test 1
vector<Data*> v0;
v0.reserve(N);
for (int j = 0; j < 10; ++j)
{
for (int i = 0; i < N; ++i)
v0.push_back(new Data(i));
for (int i = 0; i < N; ++i)
delete v0[i];
v0.clear();
}
N = 500000
Just a lot of construction /
destruction, no usage
Carlo Pescio http://carlopescio.com http://eptacom.net
Test 2
for (int j = 0; j < 1000; ++j)
{
for (int i = 0; i < N; ++i)
n += v1[i]->Value();
}
V1 prefilled with N random values.
n used after to prevent optimization
Lot of usage, no creation / destructions
Carlo Pescio http://carlopescio.com http://eptacom.net
Test 3 (playing around)
reverse(v1.begin(), v1.end());
random_shuffle(v1.begin(), v1.end());
sort(v1.begin(), v1.end(), Lt());
int m =
(*(max_element(v1.begin(), v1.end(), Lt())))->Value();
sort(v2.begin(), v2.end(), Lt());
auto bi = back_inserter(v3); // reserved at N not 2N
merge(v1.begin(), v1.end(),
v2.begin(), v2.end(), bi, Lt());
v4.assign(v3.begin(), v3.end()); // reserved at 2N
Carlo Pescio http://carlopescio.com http://eptacom.net
Notes:
- Care is needed with delete: v1 and v2 need a delete cycle,
v3 and v4 do not because they keep copies of raw ptrs.
- When using smart ptrs:
struct Lt
{
// intentionally without a & to trigger more copies!
bool operator()(const shared_ptr<Data> r1, const shared_ptr<Data> r2)
{
return r1->Value() < r2->Value();
}
};
Carlo Pescio http://carlopescio.com http://eptacom.net
Results
test 1 test 2 test 3 max RAM (VS)
raw 362 881 238 109
fat 682 1614 643 217
slim 793 2481 859 200
std 835 1099 1584 253
make_shared 505 1795 1459 181
ms MB
Carlo Pescio http://carlopescio.com http://eptacom.net
Can we make it faster?
- The weird one
- Some physics of software (run-time space)
- Use the physics, Luke
Carlo Pescio http://carlopescio.com http://eptacom.net
“Reference Linking” 2001
Carlo Pescio http://carlopescio.com http://eptacom.net
Look, no counter…
Constructor won’t throw
as it does not allocate extra storage
No need for make_shared
no need to use shared ptrs everywhere
Fat (200% overhead)
Carlo Pescio http://carlopescio.com http://eptacom.net
2005
Carlo Pescio http://carlopescio.com http://eptacom.net
Incoming count ranks
Carlo Pescio http://carlopescio.com http://eptacom.net
Quick check
Max counter was 3
in the previous
benchmark
(tested in my FAT
implementation)
Carlo Pescio http://carlopescio.com http://eptacom.net
So?
Two steps:
1) Optimize for the small
2) Turn into FAT when count > K, K small (3-4)
Must keep the same memory footprint between 1 -> 2
Carlo Pescio http://carlopescio.com http://eptacom.net
1) Optimize for the small
object
ptr
next
ptr
next
ptr
next
Carlo Pescio http://carlopescio.com http://eptacom.net
Trivial implementation
Single linked list
Predecessor -> while loop (short path anyway)
Alternatives / experiments needed
watch code size / inlining
The public stuff (some)
template< class T > class LinkedSmallPtr
{
public:
LinkedSmallPtr()
{
p = nullptr;
next = this;
}
LinkedSmallPtr(LinkedSmallPtr&& rhs)
{
MoveFrom(rhs);
}
explicit LinkedSmallPtr(T* t) noexcept
{
p = t;
next = this;
}
LinkedSmallPtr(const LinkedSmallPtr& rhs)
{
Copy(rhs);
}
~LinkedSmallPtr()
{
RemoveLink();
}
The public stuff (some)
LinkedSmallPtr& operator =(const LinkedSmallPtr& rhs)
{
if (&rhs != this)
{
RemoveLink();
Copy(rhs);
}
return *this;
}
LinkedSmallPtr& operator=(LinkedSmallPtr&& rhs)
{
if (&rhs != this)
{
RemoveLink();
MoveFrom(rhs);
}
return *this;
}
Carlo Pescio http://carlopescio.com http://eptacom.net
The private stuff
private:
T* p;
mutable LinkedSmallPtr* next;
void Copy(const LinkedSmallPtr& rhs)
{
p = rhs.p;
next = rhs.next;
rhs.next = this;
}
object
ptr
next
ptr
next
Carlo Pescio http://carlopescio.com http://eptacom.net
The private stuff
void RemoveLink()
{
if (next == this)
{
delete p;
}
else
{
LinkedSmallPtr* h = next;
while (h->next != this)
h = h->next;
h->next = next;
}
}
object
ptr
next
ptr
next
ptr
next
Carlo Pescio http://carlopescio.com http://eptacom.net
The private stuff
void MoveFrom(LinkedSmallPtr& rhs)
{
p = rhs.p;
if (rhs.next == &rhs)
{
next = this;
}
else
{
next = rhs.next;
LinkedSmallPtr* h = next;
while (h->next != &rhs)
h = h->next;
h->next = this;
}
rhs.p = nullptr;
}
object
ptr
next
Carlo Pescio http://carlopescio.com http://eptacom.net
Benchmark
test 1 test 2 test 3 max RAM (VS)
raw 362 881 238 109
fat 682 1614 643 217
slim 793 2481 859 200
std 835 1099 1584 253
make_shared 505 1795 1459 181
small linked 339 1089 633 146
ms MB
Carlo Pescio http://carlopescio.com http://eptacom.net
2) Highly-Adaptive Scavenger
It’s “linked” when count is < K
Becomes ref counted after K
While keeping its storage / identity
Uses bits you’re normally wasting
Carlo Pescio http://carlopescio.com http://eptacom.net
Wasted bits
and where to find them
…
Excessive address space (e.g. PIC32 w/ a few MB of memory)
Alignment (pointers to int, long, float, double, …)
Carlo Pescio http://carlopescio.com http://eptacom.net
Alignment (useless :-)
struct alignas(2) alignas(sizeof(void*)) RefCount
{
int count;
};
template< class T >
class alignas(2) alignas(sizeof(void*)) LinkedSmallPtr
{
// …
Carlo Pescio http://carlopescio.com http://eptacom.net
Basic strategy
object
p
q
p
q
bit 0
low
count
bit 0
high
Carlo Pescio http://carlopescio.com http://eptacom.net
Ugly, step 1
private:
T* p;
mutable uintptr_t next;
can’t play with bits on pointers…
Carlo Pescio http://carlopescio.com http://eptacom.net
How do you “count”?
On copy: hard / probably inefficient / more bits (?)
On destruction: easy / statistically ok
Better ideas welcome 
Ugly, step 2
void RemoveLink() {
if ((next & 1) == 0) {
LinkedSmallPtr* lNext = reinterpret_cast<LinkedSmallPtr*>(next);
if (lNext == this) {
delete p;
}
else {
int count = 0;
LinkedSmallPtr* h = lNext;
LinkedSmallPtr* hn;
while ((hn=reinterpret_cast<LinkedSmallPtr*>(h->next)) != this) {
h = hn;
++count;
}
h->next = next;
if (count > THRESHOLD)
assert(false);
}
}
else
assert(false);
}
FIX OTHERS, not ME 
Carlo Pescio http://carlopescio.com http://eptacom.net
“Promotion cost”
Once you get over threshold, you need to remap ALL the
nodes from linked to counted.
Careful not to oscillate around this point!
Carlo Pescio http://carlopescio.com http://eptacom.net
If you grow big, you stay big
- Once you get promoted
from linked to counted, you
stay counted even if the
count goes down.
Carlo Pescio http://carlopescio.com http://eptacom.net
You become your rhs
CC RHS CA RHS
L C L C
L C
LHS
L L C
C L C
MC RHS MA RHS
L C L C
L C
LHS
L L C
C L C
~ L C
L C
Carlo Pescio http://carlopescio.com http://eptacom.net
Benchmark
test 1 test 2 test 3
raw 362 881 238
small linked 339 1089 633
adaptive linked 345 1132 639
ms
Carlo Pescio http://carlopescio.com http://eptacom.net
What’s the catch?
Code is significantly more complex
simplification welcome
Degenerate behavior potentially costly
reference a lot, then release all
Carlo Pescio http://carlopescio.com http://eptacom.net
Conclusions
The “big” idea is building something that responds well to
the most common real-world scenarios, while adapting to
the “long tail” of unusual cases.
Extends well beyond this case
lots of experiments still needed
Carlo Pescio http://carlopescio.com http://eptacom.net
Maturity scale
• Everything is an array / a list
• Choose the right data structure / algo
• fake: map vs roll your own
• real: convenient structures for small / large cases
• Adaptive data structures with stable ifc <<<
• Self-tuning data structures with stable ifc
Frequent / Rare
Keep exploring :-) Brain-Driven Design
carlo.pescio@gmail.com

Mais conteúdo relacionado

Mais procurados

Yandex may 2013 a san-tsan_msan
Yandex may 2013   a san-tsan_msanYandex may 2013   a san-tsan_msan
Yandex may 2013 a san-tsan_msan
Yandex
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
PRADEEP
 

Mais procurados (19)

TVM VTA (TSIM)
TVM VTA (TSIM) TVM VTA (TSIM)
TVM VTA (TSIM)
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
Valgrind
ValgrindValgrind
Valgrind
 
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
 
TensorFlow XLA RPC
TensorFlow XLA RPCTensorFlow XLA RPC
TensorFlow XLA RPC
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
 
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Bridge TensorFlow to run on Intel nGraph backends (v0.5)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
 
Enrichment lecture EE Technion (parts A&B) also including the subject of VHDL...
Enrichment lecture EE Technion (parts A&B) also including the subject of VHDL...Enrichment lecture EE Technion (parts A&B) also including the subject of VHDL...
Enrichment lecture EE Technion (parts A&B) also including the subject of VHDL...
 
Analyzing the Dolphin-emu project
Analyzing the Dolphin-emu projectAnalyzing the Dolphin-emu project
Analyzing the Dolphin-emu project
 
Introduction to Debuggers
Introduction to DebuggersIntroduction to Debuggers
Introduction to Debuggers
 
PVS-Studio and 3DO Emulators
PVS-Studio and 3DO EmulatorsPVS-Studio and 3DO Emulators
PVS-Studio and 3DO Emulators
 
Playing 44CON CTF for fun and profit
Playing 44CON CTF for fun and profitPlaying 44CON CTF for fun and profit
Playing 44CON CTF for fun and profit
 
clang-intro
clang-introclang-intro
clang-intro
 
Yandex may 2013 a san-tsan_msan
Yandex may 2013   a san-tsan_msanYandex may 2013   a san-tsan_msan
Yandex may 2013 a san-tsan_msan
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
A Step Towards Data Orientation
A Step Towards Data OrientationA Step Towards Data Orientation
A Step Towards Data Orientation
 
Understand more about C
Understand more about CUnderstand more about C
Understand more about C
 
Introduction to Data Oriented Design
Introduction to Data Oriented DesignIntroduction to Data Oriented Design
Introduction to Data Oriented Design
 

Semelhante a (Slightly) Smarter Smart Pointers

The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
Edorian
 

Semelhante a (Slightly) Smarter Smart Pointers (20)

C++ Core Guidelines
C++ Core GuidelinesC++ Core Guidelines
C++ Core Guidelines
 
Picking Mushrooms after Cppcheck
Picking Mushrooms after CppcheckPicking Mushrooms after Cppcheck
Picking Mushrooms after Cppcheck
 
Cpp17 and Beyond
Cpp17 and BeyondCpp17 and Beyond
Cpp17 and Beyond
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
 
Summary of C++17 features
Summary of C++17 featuresSummary of C++17 features
Summary of C++17 features
 
Concurrency in go
Concurrency in goConcurrency in go
Concurrency in go
 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
 
Toonz code leaves much to be desired
Toonz code leaves much to be desiredToonz code leaves much to be desired
Toonz code leaves much to be desired
 
How to Adopt Modern C++17 into Your C++ Code
How to Adopt Modern C++17 into Your C++ CodeHow to Adopt Modern C++17 into Your C++ Code
How to Adopt Modern C++17 into Your C++ Code
 
How to Adopt Modern C++17 into Your C++ Code
How to Adopt Modern C++17 into Your C++ CodeHow to Adopt Modern C++17 into Your C++ Code
How to Adopt Modern C++17 into Your C++ Code
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
How to make a large C++-code base manageable
How to make a large C++-code base manageableHow to make a large C++-code base manageable
How to make a large C++-code base manageable
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
[C++] The Curiously Recurring Template Pattern: Static Polymorphsim and Expre...
 
Checking the Open-Source Multi Theft Auto Game
Checking the Open-Source Multi Theft Auto GameChecking the Open-Source Multi Theft Auto Game
Checking the Open-Source Multi Theft Auto Game
 
Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0
 
tit
tittit
tit
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 

Último

Último (20)

Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
^Clinic ^%[+27788225528*Abortion Pills For Sale In witbank
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
Abortion Clinic In Polokwane ](+27832195400*)[ 🏥 Safe Abortion Pills in Polok...
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 

(Slightly) Smarter Smart Pointers

  • 1. (slightly) smarter SMART PTRS www.italiancpp.org C++ Day 2017 2 Dicembre, Modena Carlo Pescio C++ DAY 2017 – UN EVENTO DELL’ITALIAN C++ COMMUNITY
  • 2. Carlo Pescio http://carlopescio.com http://eptacom.net Anyone not using smart ptrs?
  • 3. Carlo Pescio http://carlopescio.com http://eptacom.net Smart Pointers are cool - (good) smart pointers enable explicit lifetime design - Still smart pointers have some run-time overhead - Talk is focusing on that overhead, not lifetime design - Specifically, shared pointers
  • 4. Carlo Pescio http://carlopescio.com http://eptacom.net Just a proof of concept - Code lacks support for polymorphic conversion etc. - I’ve kept this thing in a closet way too long :-) - Probably the “invention” process is more interesting than the thing itself.
  • 5. Carlo Pescio http://carlopescio.com http://eptacom.net Shared pointers • Shared ownership of an object • Last pointer to go destroys the pointed object • Usually keeps a count of incoming references to obj • Count must be shared as well
  • 6. Carlo Pescio http://carlopescio.com http://eptacom.net Shared pointers in practice • 3 basic implementations • (+ an uncommon one) • C++11 implementation
  • 7. Carlo Pescio http://carlopescio.com http://eptacom.net Invasive
  • 8. Carlo Pescio http://carlopescio.com http://eptacom.net Slim, maybe slow
  • 9. Carlo Pescio http://carlopescio.com http://eptacom.net Fat, maybe fast
  • 10. Carlo Pescio http://carlopescio.com http://eptacom.net Standard T* cb* T object RC WC FULL OBJ * deleter ….
  • 11. Carlo Pescio http://carlopescio.com http://eptacom.net Standard + make_shared T* cb* T object RC WC FULL OBJ * [deleter] …. +
  • 12. Carlo Pescio http://carlopescio.com http://eptacom.net <Interlude> Rant is coming
  • 13. Carlo Pescio http://carlopescio.com http://eptacom.net Why I don’t like shared_ptr OPINION <<< BEWARE : ) “C was designed on the assumption that the programmer is someone sensible who knows what he's doing” (Kernighan and Ritchie) “What you don't use, you don't pay for” (Stroustrup)
  • 14. Carlo Pescio http://carlopescio.com http://eptacom.net shared_ptr: the dark side You pay the overhead of weak even if you don’t use it Trying to protect you from absence of virtual destructor You lose a standalone weak ptr make_shared is a virus (factory -> unique -> penalty)
  • 15. Carlo Pescio http://carlopescio.com http://eptacom.net Safe-by-design weak usage NOT VALID C++ (just an example) unique unique weak
  • 16. Carlo Pescio http://carlopescio.com http://eptacom.net </Interlude> Sorry for ranting
  • 17. Carlo Pescio http://carlopescio.com http://eptacom.net A little benchmark - Not much science, just a few reasonable tests, should be expanded, tested with more compilers, etc. - 3 tests: allocation, dereferencing, “complex” usage. - Tested raw, std::shared_ptr, std:shared_ptr + make_shared, a slim and a fat version I implemented.
  • 18. Carlo Pescio http://carlopescio.com http://eptacom.net Baseline (raw pointers) struct Data { public: Data(int v) : n(v) { ++liveInstances; } ~Data() { --liveInstances; } // … private: static int liveInstances; int n; }; struct Lt { bool operator()(const Data* d1, const Data* d2) { return d1->Value() < d2->Value(); } };
  • 19. Carlo Pescio http://carlopescio.com http://eptacom.net Test 1 vector<Data*> v0; v0.reserve(N); for (int j = 0; j < 10; ++j) { for (int i = 0; i < N; ++i) v0.push_back(new Data(i)); for (int i = 0; i < N; ++i) delete v0[i]; v0.clear(); } N = 500000 Just a lot of construction / destruction, no usage
  • 20. Carlo Pescio http://carlopescio.com http://eptacom.net Test 2 for (int j = 0; j < 1000; ++j) { for (int i = 0; i < N; ++i) n += v1[i]->Value(); } V1 prefilled with N random values. n used after to prevent optimization Lot of usage, no creation / destructions
  • 21. Carlo Pescio http://carlopescio.com http://eptacom.net Test 3 (playing around) reverse(v1.begin(), v1.end()); random_shuffle(v1.begin(), v1.end()); sort(v1.begin(), v1.end(), Lt()); int m = (*(max_element(v1.begin(), v1.end(), Lt())))->Value(); sort(v2.begin(), v2.end(), Lt()); auto bi = back_inserter(v3); // reserved at N not 2N merge(v1.begin(), v1.end(), v2.begin(), v2.end(), bi, Lt()); v4.assign(v3.begin(), v3.end()); // reserved at 2N
  • 22. Carlo Pescio http://carlopescio.com http://eptacom.net Notes: - Care is needed with delete: v1 and v2 need a delete cycle, v3 and v4 do not because they keep copies of raw ptrs. - When using smart ptrs: struct Lt { // intentionally without a & to trigger more copies! bool operator()(const shared_ptr<Data> r1, const shared_ptr<Data> r2) { return r1->Value() < r2->Value(); } };
  • 23. Carlo Pescio http://carlopescio.com http://eptacom.net Results test 1 test 2 test 3 max RAM (VS) raw 362 881 238 109 fat 682 1614 643 217 slim 793 2481 859 200 std 835 1099 1584 253 make_shared 505 1795 1459 181 ms MB
  • 24. Carlo Pescio http://carlopescio.com http://eptacom.net Can we make it faster? - The weird one - Some physics of software (run-time space) - Use the physics, Luke
  • 25. Carlo Pescio http://carlopescio.com http://eptacom.net “Reference Linking” 2001
  • 26. Carlo Pescio http://carlopescio.com http://eptacom.net Look, no counter… Constructor won’t throw as it does not allocate extra storage No need for make_shared no need to use shared ptrs everywhere Fat (200% overhead)
  • 27. Carlo Pescio http://carlopescio.com http://eptacom.net 2005
  • 28. Carlo Pescio http://carlopescio.com http://eptacom.net Incoming count ranks
  • 29. Carlo Pescio http://carlopescio.com http://eptacom.net Quick check Max counter was 3 in the previous benchmark (tested in my FAT implementation)
  • 30. Carlo Pescio http://carlopescio.com http://eptacom.net So? Two steps: 1) Optimize for the small 2) Turn into FAT when count > K, K small (3-4) Must keep the same memory footprint between 1 -> 2
  • 31. Carlo Pescio http://carlopescio.com http://eptacom.net 1) Optimize for the small object ptr next ptr next ptr next
  • 32. Carlo Pescio http://carlopescio.com http://eptacom.net Trivial implementation Single linked list Predecessor -> while loop (short path anyway) Alternatives / experiments needed watch code size / inlining
  • 33. The public stuff (some) template< class T > class LinkedSmallPtr { public: LinkedSmallPtr() { p = nullptr; next = this; } LinkedSmallPtr(LinkedSmallPtr&& rhs) { MoveFrom(rhs); } explicit LinkedSmallPtr(T* t) noexcept { p = t; next = this; } LinkedSmallPtr(const LinkedSmallPtr& rhs) { Copy(rhs); } ~LinkedSmallPtr() { RemoveLink(); }
  • 34. The public stuff (some) LinkedSmallPtr& operator =(const LinkedSmallPtr& rhs) { if (&rhs != this) { RemoveLink(); Copy(rhs); } return *this; } LinkedSmallPtr& operator=(LinkedSmallPtr&& rhs) { if (&rhs != this) { RemoveLink(); MoveFrom(rhs); } return *this; }
  • 35. Carlo Pescio http://carlopescio.com http://eptacom.net The private stuff private: T* p; mutable LinkedSmallPtr* next; void Copy(const LinkedSmallPtr& rhs) { p = rhs.p; next = rhs.next; rhs.next = this; } object ptr next ptr next
  • 36. Carlo Pescio http://carlopescio.com http://eptacom.net The private stuff void RemoveLink() { if (next == this) { delete p; } else { LinkedSmallPtr* h = next; while (h->next != this) h = h->next; h->next = next; } } object ptr next ptr next ptr next
  • 37. Carlo Pescio http://carlopescio.com http://eptacom.net The private stuff void MoveFrom(LinkedSmallPtr& rhs) { p = rhs.p; if (rhs.next == &rhs) { next = this; } else { next = rhs.next; LinkedSmallPtr* h = next; while (h->next != &rhs) h = h->next; h->next = this; } rhs.p = nullptr; } object ptr next
  • 38. Carlo Pescio http://carlopescio.com http://eptacom.net Benchmark test 1 test 2 test 3 max RAM (VS) raw 362 881 238 109 fat 682 1614 643 217 slim 793 2481 859 200 std 835 1099 1584 253 make_shared 505 1795 1459 181 small linked 339 1089 633 146 ms MB
  • 39. Carlo Pescio http://carlopescio.com http://eptacom.net 2) Highly-Adaptive Scavenger It’s “linked” when count is < K Becomes ref counted after K While keeping its storage / identity Uses bits you’re normally wasting
  • 40. Carlo Pescio http://carlopescio.com http://eptacom.net Wasted bits and where to find them … Excessive address space (e.g. PIC32 w/ a few MB of memory) Alignment (pointers to int, long, float, double, …)
  • 41. Carlo Pescio http://carlopescio.com http://eptacom.net Alignment (useless :-) struct alignas(2) alignas(sizeof(void*)) RefCount { int count; }; template< class T > class alignas(2) alignas(sizeof(void*)) LinkedSmallPtr { // …
  • 42. Carlo Pescio http://carlopescio.com http://eptacom.net Basic strategy object p q p q bit 0 low count bit 0 high
  • 43. Carlo Pescio http://carlopescio.com http://eptacom.net Ugly, step 1 private: T* p; mutable uintptr_t next; can’t play with bits on pointers…
  • 44. Carlo Pescio http://carlopescio.com http://eptacom.net How do you “count”? On copy: hard / probably inefficient / more bits (?) On destruction: easy / statistically ok Better ideas welcome 
  • 45. Ugly, step 2 void RemoveLink() { if ((next & 1) == 0) { LinkedSmallPtr* lNext = reinterpret_cast<LinkedSmallPtr*>(next); if (lNext == this) { delete p; } else { int count = 0; LinkedSmallPtr* h = lNext; LinkedSmallPtr* hn; while ((hn=reinterpret_cast<LinkedSmallPtr*>(h->next)) != this) { h = hn; ++count; } h->next = next; if (count > THRESHOLD) assert(false); } } else assert(false); } FIX OTHERS, not ME 
  • 46. Carlo Pescio http://carlopescio.com http://eptacom.net “Promotion cost” Once you get over threshold, you need to remap ALL the nodes from linked to counted. Careful not to oscillate around this point!
  • 47. Carlo Pescio http://carlopescio.com http://eptacom.net If you grow big, you stay big - Once you get promoted from linked to counted, you stay counted even if the count goes down.
  • 48. Carlo Pescio http://carlopescio.com http://eptacom.net You become your rhs CC RHS CA RHS L C L C L C LHS L L C C L C MC RHS MA RHS L C L C L C LHS L L C C L C ~ L C L C
  • 49. Carlo Pescio http://carlopescio.com http://eptacom.net Benchmark test 1 test 2 test 3 raw 362 881 238 small linked 339 1089 633 adaptive linked 345 1132 639 ms
  • 50. Carlo Pescio http://carlopescio.com http://eptacom.net What’s the catch? Code is significantly more complex simplification welcome Degenerate behavior potentially costly reference a lot, then release all
  • 51. Carlo Pescio http://carlopescio.com http://eptacom.net Conclusions The “big” idea is building something that responds well to the most common real-world scenarios, while adapting to the “long tail” of unusual cases. Extends well beyond this case lots of experiments still needed
  • 52. Carlo Pescio http://carlopescio.com http://eptacom.net Maturity scale • Everything is an array / a list • Choose the right data structure / algo • fake: map vs roll your own • real: convenient structures for small / large cases • Adaptive data structures with stable ifc <<< • Self-tuning data structures with stable ifc Frequent / Rare
  • 53. Keep exploring :-) Brain-Driven Design carlo.pescio@gmail.com