O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Windows IOCP vs Linux EPOLL Performance Comparison

Updated 20130618: Receive Side Scaling test

I/O event notification model performance test
Windows IOCP and Linux EPOLL

  • Seja o primeiro a comentar

Windows IOCP vs Linux EPOLL Performance Comparison

  1. 1. IOCP vs EPOLLPerformance ComparisonSeungmo Koo@sm9krkr.linkedin.com/in/sm9kr
  2. 2. Test ConfigurationDummy Clients Test ServerRandom data packetsRelay to client (echo)Client-side measurement:Server throughput(Send/Receive Mbps)Server-side measurement:CPU usage(overall % and per-core %)Gbe link
  3. 3. Test Environment - Server• Intel i7-3770k, 16GB RAM, Realtek PCIe Gigabit Ethernet• Disabled CPU-frequency scaling• Performance Test Program– Simple packet relay (echo) server using Boost.Asio 1.53• Boost.Asio uses IOCP on Windows while it uses EPOLL on Linux– I/O threads: 8– Client sessions: 10000– Buffer size per session: read 4096, write 4096• Performance Check Program– Linux: htop & sar– Windows: perfmon• Operating System– Linux: Ubuntu Linux Server 13.04 64bit, kernel 3.8.0-23+ max socket tuning– Windows: Windows Server 2012 64bit
  4. 4. Test Environment - Client• Mac mini server 2012 late– Intel i7 quad-core, 16GB RAM, Gigabit Ethernet• Dummy Client Program– Simple packet generator using Boost.Asio 1.53– # of Clients (session): 10000– I/O threads: 8– Buffer size per session: read 4096, write 4096
  5. 5. Performance Test• Two Cases– NAGLE: Nagle’s algorithm ON– NODELAY: Nagle’s algorithm OFF• Dummy Client Program– Measuring server-throughput– Sending random data to the Server and receiving those fromthe server for 600 seconds• Test Server– Measuring server CPU usage for 600 seconds• 3 Times Measurement– Uses the median result• As a result, every test was practically the same.
  6. 6. Performance Evaluation• No Session Drop– Both EPOLL and IOCP kept 10000 sessions alive during a test• Normalized Throughput– They were pretty much same in throughput0102030405060708090100NODELAY NAGLENormalized ThroughputEPOLLIOCP
  7. 7. Performance Evaluation• CPU Utilization– Average of 8-core usage– Consists of Most kernel-time and Slight user-time– IOCP defeated EPOLL0%2%4%6%8%10%12%14%NODELAY NAGLEAverage CPU usageEPOLLIOCP
  8. 8. Performance Evaluation• Average CPU Utilization Per Core (NODELAY mode)– Similar to results in case of NAGLE and NODELAY– EPOLL compared with IOCP• One of the CPU cores is consistently having high CPU utilization• While the other cores are close to the average utilization010203040506070EPOLL IOCPAverage CPU usage per core (%)CORE 0CORE 1CORE 2CORE 3CORE 4CORE 5CORE 6CORE 7Don’t care.It is HyperThreadingEffectNIC Receive Processingon only one coreSee “RSS queue”
  9. 9. Update: New Experiment with RSS option• Average CPU Utilization Per Core (NAGLE mode)– Using RSS queue (a.k.a. NIC multi-queue)– Server HW: Mac-mini 2012 server (Broadcom BCM57766 NIC)– Server OS: Windows Server 2012 and Ubuntu Server 13.04– Performance• Throughput: EPOLL’s is approximately equal to IOCP’s• Average CPU usage: virtually the same (EPOLL 7.38%, IOCP 6.8%)05101520EPOLL IOCPAverage CPU Usage per Core (%)with RSS (NIC multi-queue)CORE 0CORE 1CORE 2CORE 3CORE 4CORE 5CORE 6CORE 7
  10. 10. Summary• Throughput– There was little difference between IOCP and EPOLL• CPU usage– Without RSS (Multi-queue)• IOCP was more efficient than EPOLL in CPU utilization• EPOLL had consistently high CPU utilization compared with IOCP– With RSS mode• IOCP and EPOLL are about the same in CPU usageWhen making a high performance server for Linux,you should use RSS (multi-queue) supported NIC
  11. 11. Reference: RSS QueueLinux: NIC Multi-queue SupportWindows: NIC Receive Side Scalinghttp://msdn.microsoft.com/en-us/library/windows/hardware/ff556942(v=vs.85).aspx