Encrypting data at rest is a must-have for any modern SaaS company. And if you run your software stack on Linux, LUKS/dm-crypt [1] is the usual go-to solution. However, as the storage becomes faster, the IO latency, introduced by dm-crypt becomes rather noticeable, especially on IO intensive workloads.
At first glance it may seem natural, because data encryption is considered an expensive operation. But most modern hardware (specifically x86 and arm64) platforms have hardware optimisations to make encryption fast and less CPU intensive. Nevertheless, even on such hardware transparent disk encryption performs quite poorly.
13. @ignatkn
Storage hardware encryption
Pros:
â itâs there
â little conïŹguration needed
â fully transparent to applications
â usually faster than other layers
Cons:
â no visibility into the implementation
â no auditability
â sometimes poor security
https://support.microsoft.com/en-us/help/4516071/windows-10-update-kb4516071
49. @ignatkn
We tried...
â switching to diïŹerent cryptographic algorithms
â aes-xts seems to be the fastest (at least on x86)
â experimenting with dm-crypt optional ïŹags
â âsame_cpu_cryptâ and âsubmit_from_crypt_cpusâ
50. @ignatkn
We tried...
â switching to diïŹerent cryptographic algorithms
â aes-xts seems to be the fastest (at least on x86)
â experimenting with dm-crypt optional ïŹags
â âsame_cpu_cryptâ and âsubmit_from_crypt_cpusâ
â trying ïŹlesystem-level encryption
â much slower and potentially less secure
52. @ignatkn
Ask the community
âIf the numbers disturb you, then this is from lack of
understanding on your side. You are probably unaware
that encryption is a heavy-weight operation...â
https://www.spinics.net/lists/dm-crypt/msg07516.html
53. @ignatkn
But actually...
âUsing TLS is very cheap, even at the scale of CloudïŹare.
Modern crypto is very fast, with AES-GCM and P256
being great examples.â
https://blog.cloudflare.com/how-expensive-is-crypto-anyway/
61. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
62. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
63. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
64. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
65. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
66. @ignatkn
queues vs latency
âA signiïŹcant amount of tail latency is
due to queueing eïŹectsâ
https://www.usenix.org/conference/srecon19asia/presentation/plenz
67. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
68. @ignatkn
Crypto APIdm-crypt
block device drivers
filesystem
cryptd
kcryptd_io
kcryptd
dmcrypt_write
write
read
write_tree
dm-crypt: life of an encrypted BIO request
69. @ignatkn
dm-crypt: git archeology
â kcryptd was there from the beginning (2005)
â only for reads: âit would be very unwise to do decryption in an
interrupt contextâ
70. @ignatkn
dm-crypt: git archeology
â kcryptd was there from the beginning (2005)
â only for reads: âit would be very unwise to do decryption in an
interrupt contextâ
â some queuing was added to reduce kernel stack
usage (2006)
71. @ignatkn
dm-crypt: git archeology
â kcryptd was there from the beginning (2005)
â only for reads: âit would be very unwise to do decryption in an
interrupt contextâ
â some queuing was added to reduce kernel stack
usage (2006)
â oïŹoad writes to thread and IO sorting (2015)
â for spinning disks, but âmay improve SSDsâ
â mentions CFQ scheduler, which is deprecated
72. @ignatkn
dm-crypt: git archeology
â kcryptd was there from the beginning (2005)
â only for reads: âit would be very unwise to do decryption in an
interrupt contextâ
â some queuing was added to reduce kernel stack
usage (2006)
â oïŹoad writes to thread and IO sorting (2015)
â for spinning disks, but âmay improve SSDsâ
â mentions CFQ scheduler, which is deprecated
â commits to optionally revert some queuing
â âsame_cpu_cryptâ and âsubmit_from_crypt_cpusâ option ïŹags
73. @ignatkn
dm-crypt: things to reconsider
â most code was added with spinning disks in mind
â disk IO latency >> scheduling latency
74. @ignatkn
dm-crypt: things to reconsider
â most code was added with spinning disks in mind
â disk IO latency >> scheduling latency
â sorting BIOs in dm-crypt probably violates âdo one
thing and do it wellâ Unix principle
â the task for the IO scheduler
75. @ignatkn
dm-crypt: things to reconsider
â most code was added with spinning disks in mind
â disk IO latency >> scheduling latency
â sorting BIOs in dm-crypt probably violates âdo one
thing and do it wellâ Unix principle
â the task for the IO scheduler
â kcryptd may be redundant as modern Linux Crypto
API is asynchronous by itself
â remove oïŹoading the oïŹoad
81. @ignatkn
dm-crypt: removing queues
â dm-crypt module: a simple patch, which bypasses all
queues/async threads based on a new runtime ïŹag
â Linux Crypto API is a bit more complicated
â by default speciïŹc implementation is selected dynamically based
on priority
â aes-ni synchronous implementation is marked as âinternalâ
â aes-ni (FPU) is not usable in some interrupt contexts
82. @ignatkn
dm-crypt: removing queues
â dm-crypt module: a simple patch, which bypasses all
queues/async threads based on a new runtime ïŹag
â Linux Crypto API is a bit more complicated
â by default speciïŹc implementation is selected dynamically based
on priority
â aes-ni synchronous implementation is marked as âinternalâ
â aes-ni (FPU) is not usable in some interrupt contexts
â xtsproxy: a dedicated synchronous aes-xts module
101. @ignatkn
Conclusions
â a simple patch which may improve dm-crypt
performance by 200%-300%
â fully compatible with stock Linux dm-crypt
â can be enabled/disabled in runtime without service disruption
102. @ignatkn
Conclusions
â a simple patch which may improve dm-crypt
performance by 200%-300%
â fully compatible with stock Linux dm-crypt
â can be enabled/disabled in runtime without service disruption
â modern crypto is fast and cheap
â performance degradation is likely elsewhere
103. @ignatkn
Conclusions
â a simple patch which may improve dm-crypt
performance by 200%-300%
â fully compatible with stock Linux dm-crypt
â can be enabled/disabled in runtime without service disruption
â modern crypto is fast and cheap
â performance degradation is likely elsewhere
â extra queuing may be harmful on modern low
latency storage
104. @ignatkn
Caveats and future work
â the patch improves performance on small block
size/high IOPS workloads
â >2MB block size shows worse performance
105. @ignatkn
Caveats and future work
â the patch improves performance on small block
size/high IOPS workloads
â >2MB block size shows worse performance
â the whole setup assumes hardware-accelerated
crypto
â xtsproxy supports x86 only
106. @ignatkn
Caveats and future work
â the patch improves performance on small block
size/high IOPS workloads
â >2MB block size shows worse performance
â the whole setup assumes hardware-accelerated
crypto
â xtsproxy supports x86 only
â your mileage may vary
â always measure and compare before deployment
â let us know the results