O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Netty @Apple: Large Scale Deployment/Connectivity

2.256 visualizações

Publicada em

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1SIYyxQ.

Norman Maurer presents how Apple uses Netty for its Java based services and the challenges of doing so, including how they enhanced performance by participating in the Netty open source community. Maurer takes a deep dive into advanced topics like JNI, JVM internals, and others. Filmed at qconsf.com.

Norman Maurer is one of the core developers of Netty, a member of the Apache Software Foundation and a contributor to many Open Source Projects. He's a Senior Software Engineer for Apple, where he works on Netty and other projects.

Publicada em: Tecnologia
  • Login to see the comments

Netty @Apple: Large Scale Deployment/Connectivity

  1. 1. Netty @ Apple Massive Scale Deployment / Connectivity This is not a contribution
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /apple-netty
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. Norman Maurer Senior Software Engineer @ Apple Core Developer of Netty Formerly worked @ Red Hat as Netty Project Lead (internal Red Hat) Author of Netty in Action (Published by Manning) Apache Software Foundation Eclipse Foundation This is not a contribution
  5. 5. Massive Scale This is not a contribution
  6. 6. What does“Massive Scale”mean… Massive Scale Instances of Netty based Services in Production: 400,000+ Data / Day: 10s of PetaBytes Requests / Second: 10s of Millions Versions: 3.x (migrating to 4.x), 4.x This is not a contribution
  7. 7. Part of the OSS Community Contributing back to the Community 250+ commits from Apple Engineers in 1 year This is not a contribution
  8. 8. Services This is not a contribution Using an Apple Service? Chances are good Netty is involved somehow.
  9. 9. Areas of importance This is not a contribution Native Transport TCP / UDP / Domain Sockets PooledByteBufAllocator OpenSslEngine ChannelPool Build-in codecs + custom codecs for different protocols
  10. 10. With Scale comes Pain This is not a contribution
  11. 11. JDK NIO … some pains This is not a contribution
  12. 12. Some of the pains Selector.selectedKeys() produces too much garbage NIO implementation uses synchronized everywhere! Not optimized for typical deployment environment (support common denominator of all environments) Internal copying of heap buffers to direct buffers This is not a contribution
  13. 13. JNI to the rescue Optimized transport for Linux only Supports Linux specific features Directly operate on pointers for buffers Synchronization optimized for Netty’s Thread-Model This is not a contribution J N I C/C++Java
  14. 14. Native Transport epoll based high-performance transport Less GC pressure due less Objects Advanced features SO_REUSEPORT TCP_CORK, TCP_NOTSENT_LOWAT TCP_FASTOPEN TCP_INFO LT and ET Unix Domain Sockets Bootstrap bootstrap = new Bootstrap().group( new NioEventLoopGroup()); bootstrap.channel(NioSocketChannel.class); Bootstrap bootstrap = new Bootstrap().group( new EpollEventLoopGroup()); bootstrap.channel(EpollSocketChannel.class); NIO Transport Native Transport This is not a contribution
  15. 15. Buffers This is not a contribution
  16. 16. JDK ByteBuffer Direct buffers are free’ed by GC Not run frequently enough May trigger GC Hard to use due not separate indices This is not a contribution
  17. 17. Buffers Direct buffers == expensive Heap buffers == cheap (but not for free*) Fragmentation This is not a contribution *byte[] needs to be zero-out by the JVM!
  18. 18. Buffers - Memory fragmentation Waste memory May trigger GC due lack of coalesced free memory This is not a contribution Can’t insert int here as we need 4 continuous slots
  19. 19. Allocation times This is not a contribution NanoSeconds 0 1500 3000 4500 6000 Bytes 0 256 1024 4096 16384 65536 Unpooled Heap Pooled Heap Unpooled Direct Pooled Direct
  20. 20. PooledByteBufAllocator Based on jemalloc paper (3.x) ThreadLocal caches for lock-free allocation in most cases #808 Synchronize per Arena that holds the different chunks of memory Different size classes Reduce fragmentation ThreadLocal Cache 2 Arena 1 Arena 2 Arena 3 Size-classes Size-classes Size-classes Thread 2 ThreadLocal Cache 1 Thread 1
  21. 21. Able to enable / disable ThreadLocal caches Fine tuning of Caches can make a big difference Best effect if number of allocating Threads are low. Using ThreadLocal + MPSC queue #3833 ThreadLocal caches This is not a contribution Title ContentionCount 0 1000 2000 3000 4000 Cache No Cache
  22. 22. JDK SSL Performance …. it’s slow! This is not a contribution
  23. 23. Why handle SSL directly? Secure communication between services Used for HTTP2 / SPDY negotiation Advanced verification of Certificates This is not a contribution Unfortunately JDK's SSLEngine implementation is very slow :(
  24. 24. JDK SSLEngine implementation HTTPS Benchmark Running 2m test @ https://xxx:8080/plaintext 16 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 553.70ms 81.74ms 1.43s 80.22% Req/Sec 7.41k 595.69 8.90k 63.93% 14026376 requests in 2.00m, 1.89GB read Socket errors: connect 0, read 0, write 0, timeout 114 Requests/sec: 116883.21 Transfer/sec: 16.16MB HTTP/1.1 200 OK Content-Length: 15 Content-Type: text/plain; charset=UTF-8 Server: Netty.io Date: Wed, 17 Apr 2013 12:00:00 GMT Hello, World! Response Result ./wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/ xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/ pipeline-many.lua https://xxx:8080/plaintext Benchmark This is not a contribution
  25. 25. This is not a contribution HTTPS Benchmark JDK SSLEngine implementation Unable to fully utilize all cores SSLEngine API limiting in some cases SSLEngine.unwrap(…) can only take one ByteBuffer as src
  26. 26. JNI based SSLEngine … to the rescue This is not a contribution J N I C/C++Java
  27. 27. …one to rule them all JNI based SSLEngine Supports OpenSSL, LibreSSL and BoringSSL Based on Apache Tomcat Native Was part of Finagle but contributed to Netty in 2014 This is not a contribution
  28. 28. OpenSSL SSLEngine implementation HTTPS Benchmark Running 2m test @ https://xxx:8080/plaintext 16 threads and 256 connections Thread Stats Avg Stdev Max +/- Stdev Latency 131.16ms 28.24ms 857.07ms 96.89% Req/Sec 31.74k 3.14k 35.75k 84.41% 60127756 requests in 2.00m, 8.12GB read Socket errors: connect 0, read 0, write 0, timeout 52 Requests/sec: 501120.56 Transfer/sec: 69.30MB HTTP/1.1 200 OK Content-Length: 15 Content-Type: text/plain; charset=UTF-8 Server: Netty.io Date: Wed, 17 Apr 2013 12:00:00 GMT Hello, World! Response Result ./wrk -H 'Host: localhost' -H 'Accept: text/html,application/xhtml+xml,application/ xml;q=0.9,*/*;q=0.8' -H 'Connection: keep-alive' -d 120 -c 256 -t 16 -s scripts/ pipeline-many.lua https://xxx:8080/plaintext Benchmark This is not a contribution
  29. 29. This is not a contribution OpenSSL SSLEngine implementation HTTPS Benchmark All cores utilized! Makes use of native code provided by OpenSSL Low object creation Drop in replacement* *supported on Linux, OSX and Windows
  30. 30. Optimizations made Added client support: #7, #11, #3270, #3277, #3279 Added support for Auth: #10, #3276 GC-Pressure caused by heavy object creation: #8, #3280, #3648 Too many JNI calls: #3289 Proper SSLSession implementation: #9, #16, #17, #20, #3283, #3286, #3288 ALPN support #3481 Only do priming read if there is no space in dsts buffers #3958 This is not a contribution
  31. 31. Thread Model Easier to reason about Less worry about concurrency Easier to maintain Clear execution order Thread Event Loop Channel Channel Channel I/O I/O I/O This is not a contribution
  32. 32. Thread Model Thread Event Loop Channel Channel I/O I/O public class ProxyHandler extends ChannelInboundHandlerAdapter { @Override public void channelActive(ChannelHandlerContext ctx) { final Channel inboundChannel = ctx.channel(); Bootstrap b = new Bootstrap(); b.group(inboundChannel.eventLoop()); ctx.channel().config().setAutoRead(false); ChannelFuture f = b.connect(remoteHost, remotePort); f.addListener(f -> { if (f.isSuccess()) { ctx.channel().config().setAutoRead(true); } else { ...} }); } } This is not a contribution Proxy
  33. 33. Slow peers due slow connection Risk of writing too fast Backoff writing and reading This is not a contribution SND RCV TCP SND RCV TCP Network Fast Slow ? Slow ? Slow ? Application Slow ? Application Fast OOME Backpressure Peer1 Peer2
  34. 34. Memory Usage Handling a lot of concurrent connections Need to safe memory to reduce heap sizes Use Atomic*FieldUpdater Lazy init fields This is not a contribution
  35. 35. Connection Pooling Having an extensible connection pool is important #3607 flexible / extensible implementation This is not a contribution
  36. 36. We are hiring! http://www.apple.com/jobs/us/ This is not a contribution Thanks
  37. 37. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/apple- netty

×