SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Kqueue : Generic Event Notification




Mahendra M
Mahendra_M@infosys.com
http://www.infosys.com


This work is licensed under a Creative Commons License
http://creativecommons.org/licenses/by-sa/2.5/
Agenda
   Traditional ways of multiplexing I/O
   Methods and issues in handling asynchronous events.
   Enter Kqueue
   The Kqueue architecture.
   Kqueue possibilities.
Traditional File/Socket handling
   Traditionally a single file can be handled as below
    /* No error checking here */
    while ( i = read( fd, ... ) ) {
         do_something( with_this_data );
    }
   The above case works fine for one file descriptor
   What about the case where we have two or more such
    descriptors ( for sockets ) and data can appear on any one
    of the socket at any given point of time ?
    –   Basically, we need a mechanism for event driven applications.
    –   This is a case for multiplexing I/O ( or events ) !!
Traditional I/O multiplexing
   Use select() and/or poll()
   select() or poll() pass a list of file descriptors to the kernel
    and wait for updates to happen. On receiving an update
    these calls have the list of file descriptors that got updated.
   File descriptors passed as a bitmap – with each bit being set
    or unset to represent a file descriptor.
   Select() and poll() can watch for read/write/exception events
    on the list of file descriptors.
   On return, the applications have to parse the entire bitmap to
    see which file descriptors have to be handled.
Traditional I/O multiplexing ( contd.. )

fd_set fds;
FD_ZERO( &fds );
FD_SET( 5, &fds );
n = select( 1, &fds, NULL, NULL, NULL );
j = 0;
for ( i = 0; (i < MAX) && (j < n); i++ ) {
    if ( FD_ISSET( i ) ) {
       read_something_from_socket( i );
       j++;
    }
}
Issues with select()/poll()
   Problems of scalability
    –   Entire descriptor set has to be passed to each invocation of
        the system call ( specially with poll() - which uses an array )
    –   Massive copies from user space to kernel space and vice-
        versa
    –   Not all descriptors may have activity all the time
    –   On return, apps had to parse the entire list to check for
        updated descriptors. ( duplicated effort in kernel and app ) -
        O(N) activity
    –   Results in inefficient memory usage within the kernel
    –   In case of sleep, the list has to be parsed three times.
   sleep()/poll() can handle only file descriptors
   Coding was clunky for select()
    –   Descriptor set is a bitmap of fixed size ( default 255 )
Other forms of interesting events
   Asynchronous signal notifications
    –   Required in libraries that may want to be notified of signals
   Asynchronous timer expiry
   Asynchronous Read/Write ( aio_read(), aio_write() )
   VFS changes
   Process state Changes
   Thread state changes
   Device driver notifications
   Anything else – that will require some asynchronous event
    notification – and the design allowing it.
Available solutions
   Linux 2.4 : SIGIO
   Sun Solaris : /dev/poll
   Linux 2.4 : /dev/epoll
    –   Use ioctl() to manipulate the above.
   Even Microsoft Windows had something to offer.
   Kqueue – for BSD boxes.
    –   We shall be talking about that now !!
Kqueue - Goals
   A generic event notification framework
    –   File descriptors (read/write/exceptions), Signals,
        Asynchronous I/O ( not in OSFR ), Vnodes monitoring,
        process monitoring, Timer events.
   A single system call to handle all this.
   Capability to add new functionality.
   Efficient use of memory
    –   Memory should be allocated as per need.
    –   Should be able to register/receive interested number of
        events.
    –   Events should be combined ( eg: data arriving over a socket )
   Should be good replacements for standard calls.
   Should be possible to extend this functionality easily
Kqueue APIs
   int32_t kqueue( void );
    –   Creates a kernel queue. It is identical to a file descriptor. It can
        be deleted using the close() system call.
   int32_t kevent( kq, changes, nc, events, ne,
    timeout );
    –   To register events in the kernel queue
    –   To receive events that occurred between consecutive calls.
    –   Can simulate select(), poll() - Using different values of timeout
    –   No need to store the event descriptors locally in the
        application.
   EV_SET( &event, ident, filter, flags,
    fflags, data, udata)
    –   Used to prepare an event for registering in the kernel queue.
Kqueue sample code
kq = kqueue();
struct kevent kev[10];
// Prepare an event
EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0);
// Register an event
kevent( kq, &kev, 10, NULL, 0, timeout );


// Receive events
n = kevent( kq, NULL, 0, &kev, 10, timeout );
for ( i = 0; i < n; i++ ) {
    // Do something
}
Kqueue filter types
   READ : Returns when data is available for read from
    sockets, vnodes, fifos, pipes
    –   ident = descriptor
    –   Data = amount of data to be read
    –   Flags = can be EOF etc.
   WRITE : Returns when it is possible to write to a descriptor
    ( ident ).
    –   Data = amount of data that can be written
   VNODE : Returns when a file descriptor changes
    –   fflags = delete, write, extend, attrib, link, rename, revoke
Kqueue filter types ( contd... )
   PROC : Monitors a process
    –   Ident = pid of the process to be monitored.
    –   Fflags = Exit, fork, exec, track, trackerr
   SIGNAL : Returns when a signal is delivered to a process.
    –   Ident = signal number
    –   Data = no of times the signal was delivered.
    –   Co-exists with signal() and sigaction() - and has a lower
        precedence.
    –   Is delivered even if SIG_IGN is set for the signal
   TIMER : Establishes a timer
    –   ident = timer id, Data = timeout in milliseconds, or no of times
    –   Periodic by default unless ONESHOT is specified
Kqueue Flags
   ADD : To add an event to the queue
   ENABLE : To enable a disabled event
   DISABLE : To temporarily disable an event ( not deleted )
   DELETE : Remove an event from the kernel queue
   ONESHOT : Cause the event to happen only once.
   CLEAR : Clear the state of the filter after it is received
   EOF : End – of – File
   ERROR : Specific errors.
Kqueue – Good things
   As you would have seen – It is extremely scalable in
    handling large file descriptors
     –   Eliminates most of the deficiencies of select()/poll()
     –   Currently, efforts are underway to migrate some popular
         daemons ( Apache ) to use Kqueue.
   It supports a wide range of events – not just file descriptors.
   Is easily extensible.
   New kqueue filters can be added very easily inside the BSD
    kernels.
   Opens up a lot of interesting possibilities.
Issues with Kqueue
   Kqueue calls are not part of POSIX specifications.
    –   Most of the Unix systems do not implement it.
    –   Breaks portability across Unices
   Third party code may still use select(), poll() etc. We may
    have to migrate this or allow these to co-exist
   Relatively new in the play field – Not time-tested.
References
   Kqueue: A generic and scalable event notification facility -
    Jonathan Lemon
        http://people.freebsd.org/~jlemon/papers/kqueue.pdf
   Man pages for kqueue, knote, kfilter_register
   Read the source, Luke !!
Finally ...
   Questions ??
   Thanks to
    –   Organizers for giving me a chance to speak at GNUnify 2006
    –   NetBSD and Linux developers who helped me during my work
    –   To Infosys for sponsoring my visit to GNUnify 2006
   Special thanks to YOU for listening...


                      You can contact me at :
                    Mahendra_M@infosys.com

Mais conteúdo relacionado

Mais procurados

Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
Kernel TLV
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 

Mais procurados (20)

The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021Portable TeX Documents (PTD): PackagingCon 2021
Portable TeX Documents (PTD): PackagingCon 2021
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
protothread and its usage in contiki OS
protothread and its usage in contiki OSprotothread and its usage in contiki OS
protothread and its usage in contiki OS
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
RxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance ResultsRxNetty vs Tomcat Performance Results
RxNetty vs Tomcat Performance Results
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
 

Semelhante a Kqueue : Generic Event notification

Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
Silvio Cesare
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
Johan Tibell
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 

Semelhante a Kqueue : Generic Event notification (20)

Linux IO
Linux IOLinux IO
Linux IO
 
UNIX Operating System ppt
UNIX Operating System pptUNIX Operating System ppt
UNIX Operating System ppt
 
Forensic artifacts in modern linux systems
Forensic artifacts in modern linux systemsForensic artifacts in modern linux systems
Forensic artifacts in modern linux systems
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2REAL TIME OPERATING SYSTEM PART 2
REAL TIME OPERATING SYSTEM PART 2
 
Docker Runtime Security
Docker Runtime SecurityDocker Runtime Security
Docker Runtime Security
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
Linux Performance Tunning Kernel
Linux Performance Tunning KernelLinux Performance Tunning Kernel
Linux Performance Tunning Kernel
 
Lxc – next gen virtualization for cloud intro (cloudexpo)
Lxc – next gen virtualization for cloud   intro (cloudexpo)Lxc – next gen virtualization for cloud   intro (cloudexpo)
Lxc – next gen virtualization for cloud intro (cloudexpo)
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
brief intro to Linux device drivers
brief intro to Linux device driversbrief intro to Linux device drivers
brief intro to Linux device drivers
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...DevSecCon Singapore 2018 - System call auditing made effective with machine l...
DevSecCon Singapore 2018 - System call auditing made effective with machine l...
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Unix 3 en
Unix 3 enUnix 3 en
Unix 3 en
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Kqueue : Generic Event notification

  • 1. Kqueue : Generic Event Notification Mahendra M Mahendra_M@infosys.com http://www.infosys.com This work is licensed under a Creative Commons License http://creativecommons.org/licenses/by-sa/2.5/
  • 2. Agenda  Traditional ways of multiplexing I/O  Methods and issues in handling asynchronous events.  Enter Kqueue  The Kqueue architecture.  Kqueue possibilities.
  • 3. Traditional File/Socket handling  Traditionally a single file can be handled as below /* No error checking here */ while ( i = read( fd, ... ) ) { do_something( with_this_data ); }  The above case works fine for one file descriptor  What about the case where we have two or more such descriptors ( for sockets ) and data can appear on any one of the socket at any given point of time ? – Basically, we need a mechanism for event driven applications. – This is a case for multiplexing I/O ( or events ) !!
  • 4. Traditional I/O multiplexing  Use select() and/or poll()  select() or poll() pass a list of file descriptors to the kernel and wait for updates to happen. On receiving an update these calls have the list of file descriptors that got updated.  File descriptors passed as a bitmap – with each bit being set or unset to represent a file descriptor.  Select() and poll() can watch for read/write/exception events on the list of file descriptors.  On return, the applications have to parse the entire bitmap to see which file descriptors have to be handled.
  • 5. Traditional I/O multiplexing ( contd.. ) fd_set fds; FD_ZERO( &fds ); FD_SET( 5, &fds ); n = select( 1, &fds, NULL, NULL, NULL ); j = 0; for ( i = 0; (i < MAX) && (j < n); i++ ) { if ( FD_ISSET( i ) ) { read_something_from_socket( i ); j++; } }
  • 6. Issues with select()/poll()  Problems of scalability – Entire descriptor set has to be passed to each invocation of the system call ( specially with poll() - which uses an array ) – Massive copies from user space to kernel space and vice- versa – Not all descriptors may have activity all the time – On return, apps had to parse the entire list to check for updated descriptors. ( duplicated effort in kernel and app ) - O(N) activity – Results in inefficient memory usage within the kernel – In case of sleep, the list has to be parsed three times.  sleep()/poll() can handle only file descriptors  Coding was clunky for select() – Descriptor set is a bitmap of fixed size ( default 255 )
  • 7. Other forms of interesting events  Asynchronous signal notifications – Required in libraries that may want to be notified of signals  Asynchronous timer expiry  Asynchronous Read/Write ( aio_read(), aio_write() )  VFS changes  Process state Changes  Thread state changes  Device driver notifications  Anything else – that will require some asynchronous event notification – and the design allowing it.
  • 8. Available solutions  Linux 2.4 : SIGIO  Sun Solaris : /dev/poll  Linux 2.4 : /dev/epoll – Use ioctl() to manipulate the above.  Even Microsoft Windows had something to offer.  Kqueue – for BSD boxes. – We shall be talking about that now !!
  • 9. Kqueue - Goals  A generic event notification framework – File descriptors (read/write/exceptions), Signals, Asynchronous I/O ( not in OSFR ), Vnodes monitoring, process monitoring, Timer events.  A single system call to handle all this.  Capability to add new functionality.  Efficient use of memory – Memory should be allocated as per need. – Should be able to register/receive interested number of events. – Events should be combined ( eg: data arriving over a socket )  Should be good replacements for standard calls.  Should be possible to extend this functionality easily
  • 10. Kqueue APIs  int32_t kqueue( void ); – Creates a kernel queue. It is identical to a file descriptor. It can be deleted using the close() system call.  int32_t kevent( kq, changes, nc, events, ne, timeout ); – To register events in the kernel queue – To receive events that occurred between consecutive calls. – Can simulate select(), poll() - Using different values of timeout – No need to store the event descriptors locally in the application.  EV_SET( &event, ident, filter, flags, fflags, data, udata) – Used to prepare an event for registering in the kernel queue.
  • 11. Kqueue sample code kq = kqueue(); struct kevent kev[10]; // Prepare an event EV_SET( &kev[0], fd, EVFILT_READ, EV_ADD, 0, 0, 0); // Register an event kevent( kq, &kev, 10, NULL, 0, timeout ); // Receive events n = kevent( kq, NULL, 0, &kev, 10, timeout ); for ( i = 0; i < n; i++ ) { // Do something }
  • 12. Kqueue filter types  READ : Returns when data is available for read from sockets, vnodes, fifos, pipes – ident = descriptor – Data = amount of data to be read – Flags = can be EOF etc.  WRITE : Returns when it is possible to write to a descriptor ( ident ). – Data = amount of data that can be written  VNODE : Returns when a file descriptor changes – fflags = delete, write, extend, attrib, link, rename, revoke
  • 13. Kqueue filter types ( contd... )  PROC : Monitors a process – Ident = pid of the process to be monitored. – Fflags = Exit, fork, exec, track, trackerr  SIGNAL : Returns when a signal is delivered to a process. – Ident = signal number – Data = no of times the signal was delivered. – Co-exists with signal() and sigaction() - and has a lower precedence. – Is delivered even if SIG_IGN is set for the signal  TIMER : Establishes a timer – ident = timer id, Data = timeout in milliseconds, or no of times – Periodic by default unless ONESHOT is specified
  • 14. Kqueue Flags  ADD : To add an event to the queue  ENABLE : To enable a disabled event  DISABLE : To temporarily disable an event ( not deleted )  DELETE : Remove an event from the kernel queue  ONESHOT : Cause the event to happen only once.  CLEAR : Clear the state of the filter after it is received  EOF : End – of – File  ERROR : Specific errors.
  • 15. Kqueue – Good things  As you would have seen – It is extremely scalable in handling large file descriptors – Eliminates most of the deficiencies of select()/poll() – Currently, efforts are underway to migrate some popular daemons ( Apache ) to use Kqueue.  It supports a wide range of events – not just file descriptors.  Is easily extensible.  New kqueue filters can be added very easily inside the BSD kernels.  Opens up a lot of interesting possibilities.
  • 16. Issues with Kqueue  Kqueue calls are not part of POSIX specifications. – Most of the Unix systems do not implement it. – Breaks portability across Unices  Third party code may still use select(), poll() etc. We may have to migrate this or allow these to co-exist  Relatively new in the play field – Not time-tested.
  • 17. References  Kqueue: A generic and scalable event notification facility - Jonathan Lemon http://people.freebsd.org/~jlemon/papers/kqueue.pdf  Man pages for kqueue, knote, kfilter_register  Read the source, Luke !!
  • 18. Finally ...  Questions ??  Thanks to – Organizers for giving me a chance to speak at GNUnify 2006 – NetBSD and Linux developers who helped me during my work – To Infosys for sponsoring my visit to GNUnify 2006  Special thanks to YOU for listening... You can contact me at : Mahendra_M@infosys.com