SlideShare a Scribd company logo
1 of 23
Download to read offline
Takuya ASADA<syuu@dokukino.com>
                     @syuu1228
   I was in embedded software company,
    worked on SMP support for router firmware
   Ph. D. Student of Tokyo University of Technology,
    researching improvement network I/O
    architecture on modern x86 servers
   Interested in: SMP, Network, Virtualization
   GSoC ’11(FreeBSD) Multithread support for BPF
   GSoC ’12(FreeBSD) BIOS support for BHyVe
   Research assistant at IIJ research laboratory,
    implementing BCube for Linux

                            Today’s topic!
   BCube is a new network architecture
   Designed for shipping-container based
    modular data centers
   Server-centric network structure
    ◦ Server act as
      End hosts
      Relay nodes for each other
   The paper published in ACM SIGCOMM ’09 by
    Microsoft Research Asia
   Each server has one connection to each layers
   Switches never connect to other switches
   Servers relay traffic for each other
             2,0              2,1              2,0       2,1


             1,0              1,1              1,0       1,1


             0,0              0,1              0,0       0,1




           000 001          010 011          100 101   110 111
                                                                        switch
                   Bcube0
                                    Bcube1                              server
                                                               Bcube2
    𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers
    𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1
    𝐵𝐶𝑢𝑏𝑒0 contains n servers
   Total servers = 𝑛 𝑘+1
                2,0              2,1              2,0       2,1


                1,0              1,1              1,0       1,1


                0,0              0,1              0,0       0,1




              000 001          010 011          100 101   110 111
                                                                           switch
                      Bcube0
                                       Bcube1                              server
                                                                  Bcube2
   High network capacity for various traffic
    patterns
    ◦   one-to-one
    ◦   one-to-all
    ◦   one-to-several
    ◦   all-to-all
   Performance degrades gracefully as
    servers/switches failure increases
   Doesn’t need special hardware, only use
    commodity switch
   Each server has unique BCube address
   Each digit pointed port number of switch in
    the layer
             2,0              2,1              2,0       2,1


             1,0              1,1              1,0       1,1


             0,0              0,1              0,0       0,1




           000 001          010 011          100 101   110 111
                   Bcube0
                                                                        switch

                                    Bcube1                              server
                                                               Bcube2
   Default routing rule
    ◦ Top layer→Bottom layer
    ◦ Ex: Route from 000 to 111
      000 →100 →110 →111
                2,0              2,1              2,0       2,1


                1,0              1,1              1,0       1,1


                0,0              0,1              0,0       0,1




              000 001          010 011          100 101   110 111
                      Bcube0
                                       Bcube1
                                                                  Bcube2
   There are alternate routes between any nodes
   Can bypass failure servers and switches
   Also can use acceralate throughput to
    parallelize traffic
               2,0              2,1              2,0       2,1


               1,0              1,1              1,0       1,1


               0,0              0,1              0,0       0,1




             000 001          010 011          100 101   110 111
                     Bcube0
                                      Bcube1
                                                                 Bcube2
   Source server decides the best path for a flow
   Bypass failure paths
   To propagate routing path, source server
    writes routing path information on packet
    header
   Add BCube header between Ethernet header
    and IP header
   Has src/dst address and also routing path
    information on “Next Hop Index Array”

          Ethernet Header
                             BCube dest address
                            BCube source address
           BCube Header
                               Protocol type


             IP Header      Next Hop Index Array
   Evaluating various "Data Center Network"
    technologies, especially for container-
    moduler datacenter architecture.
    BCube is one of the candidate.
   Try to use existing code as much as possible
   Minimum implementation at first

   BCube binds multiple interface,
    assigns a BCube address and an IP address
   What is the most similar function which
    already existing on Linux? →Bridge!
    ◦ Forked bridge.ko and brctl command,
      named bcube.ko and bcctl command
   brctl addbr <bridge>
    brctl delbr <bridge>
                        ↓
    bcctl addbc <bcube> <bcaddr> <N> <K>
    bcctl delbc <bcube>
   Modified addbr/delbr, add 3 args
    ◦ BCube address
    ◦ n and k parameter
   Use MAC address format/size for BCube address
                 101   → 00:00:01:00:01
   Use BCube address for HW address of BCube
    device
    ◦ It works like fake MAC address on Linux network stack
   brctl addif <bridge> <device>
    brctl delif <bridge> <device>
                         ↓
    bcctl assignif <bcube> <layer> <device>
    bcctl unassignif <bcube> <layer> <device>
   Modified assignif / unassignif command, add
    layer number on args
   Need to reconsider address resolution
   Normal Ethernet
    ◦ IP Address → MAC Address (ARP)
   BCube network
    ◦ IP Address → BCube Address
      → ARP?
    ◦ (Neighbor) BCube address → MAC Address
      → Need additional neighbor discovery protocol
   Once broadcast works on BCube
    implementation, ARP should work on it
   But I haven’t implemented it yet, decided to
    configure manually by following command:
    arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
   Need an ARP like protocol
   Decided to configure manually too,
    implemented following command:
    bcctl addneighbour <bcube> <layer>
    <bcaddr> <macaddr>
    bcctl delneighbour <bcube> <layer>
    <bcaddr>
   bcube.ko maintenance neighbor table, use it
    in packet transmitting/forwarding
   In bridge.ko, it maintenance FDB(forwarding
    database) to lookup destination MAC
    address→output port using hash table
   Deleted FDB, implemented function to decide
    next hop BCube address, output port, and
    MAC address of next hop
   Haven’t implemented source routing – just
    default routing for now
   Top layer→Bottom layer
   Ex: Route from 000 to 111
    000 →100 →110 →111

              2,0              2,1              2,0       2,1


              1,0              1,1              1,0       1,1


              0,0              0,1              0,0       0,1




            000 001          010 011          100 101   110 111
                    Bcube0
                                     Bcube1
                                                                Bcube2
   To add BCube Header between Ethernet Header
    and IP header, I forked net/ethernet/eth.c
   ETH_HLEN (14byte)
    → BCUBE_HLEN (24byte)
   struct ethhdr (MAC header)
    → struct bcubehdr (MAC & BCube header)
   eth_header_ops → bc_header_ops
    To handle Bcube Header
   Unfortunately GRO accesses ethernet header
    directly, and it works before BCube handles a
    packet – need to disable it
   Found a way to implement new L2 framework
    using existing bridge implementation
    ◦ Lot more easy than implement it from scrach
   Development Status
    ◦ Implemented basic features, debugging now
    ◦ Will consider to add more features
      broadcast / multicast
      Intermediate node/switch failure detection, change the
       routing
      source routing
      address resolution protocol
   Planing more detail evaluation in our data center
    testbed
   Any comments and suggestions are welcome 
This work was done as part of research
assistance work at IIJ research laboratory.

More Related Content

Viewers also liked

Fotos increíbles
Fotos increíblesFotos increíbles
Fotos increíblesManuel Fal
 
Riddor reportable hand injury
Riddor reportable hand injuryRiddor reportable hand injury
Riddor reportable hand injuryAlan Bassett
 
Prioritization to Production
Prioritization to ProductionPrioritization to Production
Prioritization to ProductionBoaz Katz
 
Copying Isn’T Cool
Copying Isn’T CoolCopying Isn’T Cool
Copying Isn’T Coolmatt210
 
Onim Nov Supplement High Res
Onim Nov Supplement High ResOnim Nov Supplement High Res
Onim Nov Supplement High Rescnunnally
 
Expo Booking Form Wynyard
Expo Booking Form WynyardExpo Booking Form Wynyard
Expo Booking Form WynyardAlan Bassett
 
Driving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t MixDriving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t MixAlan Bassett
 
RPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 DraftRPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 DraftAlan Bassett
 
イマドキなNetwork/IO
イマドキなNetwork/IOイマドキなNetwork/IO
イマドキなNetwork/IOTakuya ASADA
 
PCA10 Heres a Scenario For You
PCA10 Heres a Scenario For YouPCA10 Heres a Scenario For You
PCA10 Heres a Scenario For YouPaul Teich
 
Designing E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, BrightwaveDesigning E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, BrightwaveBrightwave Group
 
Complete Streets Brochures
Complete Streets BrochuresComplete Streets Brochures
Complete Streets Brochureslcschott
 

Viewers also liked (17)

Fotos increíbles
Fotos increíblesFotos increíbles
Fotos increíbles
 
Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
Fachtagung eCommerce und PIM
Fachtagung eCommerce und PIMFachtagung eCommerce und PIM
Fachtagung eCommerce und PIM
 
Riddor reportable hand injury
Riddor reportable hand injuryRiddor reportable hand injury
Riddor reportable hand injury
 
Prioritization to Production
Prioritization to ProductionPrioritization to Production
Prioritization to Production
 
Copying Isn’T Cool
Copying Isn’T CoolCopying Isn’T Cool
Copying Isn’T Cool
 
Onim Nov Supplement High Res
Onim Nov Supplement High ResOnim Nov Supplement High Res
Onim Nov Supplement High Res
 
G8WAY
G8WAYG8WAY
G8WAY
 
Expo Booking Form Wynyard
Expo Booking Form WynyardExpo Booking Form Wynyard
Expo Booking Form Wynyard
 
Driving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t MixDriving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t Mix
 
RPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 DraftRPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 Draft
 
Vida
VidaVida
Vida
 
Keynote
Keynote Keynote
Keynote
 
イマドキなNetwork/IO
イマドキなNetwork/IOイマドキなNetwork/IO
イマドキなNetwork/IO
 
PCA10 Heres a Scenario For You
PCA10 Heres a Scenario For YouPCA10 Heres a Scenario For You
PCA10 Heres a Scenario For You
 
Designing E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, BrightwaveDesigning E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, Brightwave
 
Complete Streets Brochures
Complete Streets BrochuresComplete Streets Brochures
Complete Streets Brochures
 

Similar to Implementing a layer 2 framework on linux network

2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料Keiichi Maeda
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAndrew Yongjoon Kong
 
National Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto PresentationNational Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto PresentationMeltin Bell
 
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...OpenStack Korea Community
 
Olive Introduction for TOI
Olive Introduction for TOIOlive Introduction for TOI
Olive Introduction for TOIJohnson Liu
 
Patent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting ImplementationsPatent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting ImplementationsMeltin Bell
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011e11ie
 
Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)edissG
 
Ccna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_teamCcna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_teamRaja Mazhar
 
Limitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LBLimitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LBJungIn Jung
 
Fixed Length Subnetting about ip address.pptx
Fixed Length Subnetting about ip address.pptxFixed Length Subnetting about ip address.pptx
Fixed Length Subnetting about ip address.pptxShaqib3
 
Day 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocolDay 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocolCYBERINTELLIGENTS
 

Similar to Implementing a layer 2 framework on linux network (20)

2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
 
National Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto PresentationNational Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto Presentation
 
Tunnel without tunnel
Tunnel without tunnelTunnel without tunnel
Tunnel without tunnel
 
Switching
SwitchingSwitching
Switching
 
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
 
Lab 3.5.1 basic frame relay
Lab 3.5.1 basic frame relayLab 3.5.1 basic frame relay
Lab 3.5.1 basic frame relay
 
Olive Introduction for TOI
Olive Introduction for TOIOlive Introduction for TOI
Olive Introduction for TOI
 
D0532025
D0532025D0532025
D0532025
 
Day03
Day03 Day03
Day03
 
Patent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting ImplementationsPatent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting Implementations
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011
 
Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)
 
IPv6 Static Routes
IPv6 Static RoutesIPv6 Static Routes
IPv6 Static Routes
 
VXLAN with Cumulus
VXLAN with CumulusVXLAN with Cumulus
VXLAN with Cumulus
 
Ccna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_teamCcna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_team
 
Limitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LBLimitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LB
 
Fixed Length Subnetting about ip address.pptx
Fixed Length Subnetting about ip address.pptxFixed Length Subnetting about ip address.pptx
Fixed Length Subnetting about ip address.pptx
 
Day 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocolDay 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocol
 

More from Takuya ASADA

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Takuya ASADA
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークTakuya ASADA
 
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」Takuya ASADA
 
ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜Takuya ASADA
 
UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダTakuya ASADA
 
OSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingOSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingTakuya ASADA
 
OSvパンフレット v3
OSvパンフレット v3OSvパンフレット v3
OSvパンフレット v3Takuya ASADA
 
OSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallOSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallTakuya ASADA
 
OSvの概要と実装
OSvの概要と実装OSvの概要と実装
OSvの概要と実装Takuya ASADA
 
Linux network stack
Linux network stackLinux network stack
Linux network stackTakuya ASADA
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 
Presentation on your terminal
Presentation on your terminalPresentation on your terminal
Presentation on your terminalTakuya ASADA
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがないTakuya ASADA
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt AffinityについてTakuya ASADA
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレットTakuya ASADA
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜Takuya ASADA
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2Takuya ASADA
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1Takuya ASADA
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化Takuya ASADA
 

More from Takuya ASADA (20)

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
 
ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜
 
UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダ
 
OSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingOSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meeting
 
OSvパンフレット v3
OSvパンフレット v3OSvパンフレット v3
OSvパンフレット v3
 
OSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallOSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/Fall
 
OSv噺
OSv噺OSv噺
OSv噺
 
OSvの概要と実装
OSvの概要と実装OSvの概要と実装
OSvの概要と実装
 
Linux network stack
Linux network stackLinux network stack
Linux network stack
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
Presentation on your terminal
Presentation on your terminalPresentation on your terminal
Presentation on your terminal
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt Affinityについて
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレット
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Implementing a layer 2 framework on linux network

  • 2. I was in embedded software company, worked on SMP support for router firmware  Ph. D. Student of Tokyo University of Technology, researching improvement network I/O architecture on modern x86 servers  Interested in: SMP, Network, Virtualization  GSoC ’11(FreeBSD) Multithread support for BPF  GSoC ’12(FreeBSD) BIOS support for BHyVe  Research assistant at IIJ research laboratory, implementing BCube for Linux Today’s topic!
  • 3. BCube is a new network architecture  Designed for shipping-container based modular data centers  Server-centric network structure ◦ Server act as  End hosts  Relay nodes for each other  The paper published in ACM SIGCOMM ’09 by Microsoft Research Asia
  • 4. Each server has one connection to each layers  Switches never connect to other switches  Servers relay traffic for each other 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  • 5. 𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers  𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1  𝐵𝐶𝑢𝑏𝑒0 contains n servers  Total servers = 𝑛 𝑘+1 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  • 6. High network capacity for various traffic patterns ◦ one-to-one ◦ one-to-all ◦ one-to-several ◦ all-to-all  Performance degrades gracefully as servers/switches failure increases  Doesn’t need special hardware, only use commodity switch
  • 7. Each server has unique BCube address  Each digit pointed port number of switch in the layer 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 switch Bcube1 server Bcube2
  • 8. Default routing rule ◦ Top layer→Bottom layer ◦ Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 9. There are alternate routes between any nodes  Can bypass failure servers and switches  Also can use acceralate throughput to parallelize traffic 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 10. Source server decides the best path for a flow  Bypass failure paths  To propagate routing path, source server writes routing path information on packet header
  • 11. Add BCube header between Ethernet header and IP header  Has src/dst address and also routing path information on “Next Hop Index Array” Ethernet Header BCube dest address BCube source address BCube Header Protocol type IP Header Next Hop Index Array
  • 12. Evaluating various "Data Center Network" technologies, especially for container- moduler datacenter architecture. BCube is one of the candidate.
  • 13. Try to use existing code as much as possible  Minimum implementation at first  BCube binds multiple interface, assigns a BCube address and an IP address  What is the most similar function which already existing on Linux? →Bridge! ◦ Forked bridge.ko and brctl command, named bcube.ko and bcctl command
  • 14. brctl addbr <bridge> brctl delbr <bridge> ↓ bcctl addbc <bcube> <bcaddr> <N> <K> bcctl delbc <bcube>  Modified addbr/delbr, add 3 args ◦ BCube address ◦ n and k parameter  Use MAC address format/size for BCube address 101 → 00:00:01:00:01  Use BCube address for HW address of BCube device ◦ It works like fake MAC address on Linux network stack
  • 15. brctl addif <bridge> <device> brctl delif <bridge> <device> ↓ bcctl assignif <bcube> <layer> <device> bcctl unassignif <bcube> <layer> <device>  Modified assignif / unassignif command, add layer number on args
  • 16. Need to reconsider address resolution  Normal Ethernet ◦ IP Address → MAC Address (ARP)  BCube network ◦ IP Address → BCube Address → ARP? ◦ (Neighbor) BCube address → MAC Address → Need additional neighbor discovery protocol
  • 17. Once broadcast works on BCube implementation, ARP should work on it  But I haven’t implemented it yet, decided to configure manually by following command: arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
  • 18. Need an ARP like protocol  Decided to configure manually too, implemented following command: bcctl addneighbour <bcube> <layer> <bcaddr> <macaddr> bcctl delneighbour <bcube> <layer> <bcaddr>  bcube.ko maintenance neighbor table, use it in packet transmitting/forwarding
  • 19. In bridge.ko, it maintenance FDB(forwarding database) to lookup destination MAC address→output port using hash table  Deleted FDB, implemented function to decide next hop BCube address, output port, and MAC address of next hop  Haven’t implemented source routing – just default routing for now
  • 20. Top layer→Bottom layer  Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 21. To add BCube Header between Ethernet Header and IP header, I forked net/ethernet/eth.c  ETH_HLEN (14byte) → BCUBE_HLEN (24byte)  struct ethhdr (MAC header) → struct bcubehdr (MAC & BCube header)  eth_header_ops → bc_header_ops To handle Bcube Header  Unfortunately GRO accesses ethernet header directly, and it works before BCube handles a packet – need to disable it
  • 22. Found a way to implement new L2 framework using existing bridge implementation ◦ Lot more easy than implement it from scrach  Development Status ◦ Implemented basic features, debugging now ◦ Will consider to add more features  broadcast / multicast  Intermediate node/switch failure detection, change the routing  source routing  address resolution protocol  Planing more detail evaluation in our data center testbed  Any comments and suggestions are welcome 
  • 23. This work was done as part of research assistance work at IIJ research laboratory.