SlideShare uma empresa Scribd logo
1 de 24
Geunsik Lim
http://leemgs.fedorapeople.org
8/11/2016 9:35 AM
Distributed Compilation System for
High-Speed Software Build Processes
2/22
• Full name: Geunsik Lim
• E-mail : leemgs@skku.edu, geunsik.lim@samsung.com
• Affiliation : Sungkyunkwan University, Samsung Electronics
• Homepage : http://leemgs.fedorapeople.org
Who am I ?
3/22
Introduction
Background
Design and Implementation
Object File Based Server & Client Model
CPU Scheduling of Distributed PC Resource
Cross Compiling for Heterogeneous CPU Architecture
Evaluation
Conclusion
Outline
4/22
Stat. of distributed client PCs
• Studies into high performance computing still lack research of public computer
facilities, which have a lot of idle times.
• Example of used and idle times of 300 dual-core PCs in a university library
public computer facilities,
which have a lot of idle times
5/22
What is covered in this talk?
• Current state of source size growth of software from January 2009 until July 2013.
Sevenfold
6/22
Cost Statistics for Building Platform Source
• Time cost needed to build a large mobile platform such as Android 4.2.2.
Compilation costs account for 67 percentages (34 minutes) of the total cost of
execution (51 minutes).
7/22
IdleComputers
What is our final goal?
In unity there is strength.
8/22
Distributed compiler network: to
distribute compiling tasks across a
network
Support portable Linux system for
Windows PCs: for distributed compil
er network using the existing HTCond
or pool of Windows PCs
Establish remote command-line envi
ronment: HTCondor does on the libra
ry PC must work without any user inte
raction, so no GUI.
What is our challenge?
Requirement: We can
NOT use a GUI installer, as
you will not be sitting in
front of the distributed PC
when users doing their work.
Windows XP
Windows 7
Linux/
9/22
System Architecture of DistCom
HTCondor Pool Manager
HTCondor Client
(3) Cross-Compiler Infrastructure
(1.1) DistCom Service Daemon
(=DistCom Server)
Legend:DistComComponents
X86 Windows X86 Linux ARM Android
Distributed
Resources
Cloud
Developers
Scientific
Researchers
(2)DistCom
Manager
(1.2) DistCom Client
OperatingSystemof
DistributedPCs(Windows)
Users
Platform
Builders
Workload
Monitor
HTCondor Collector
Resource
Manager
• Distributed server and client model: to control distributed PC resources connected via network
• DistCom manager: for scheduling distributed PC resources
• Cross-Compiler infrastructure: to support heterogeneous architecture
Source Codes
Server & Client Model
10/22
Object File Based Server & Client Model
(1.2) DistCom
Client
HTCondor
Pool
Manager
HTCondor
PC
(Windows)
①Check of PC’s status
• Collecting Information
• Monitoring Workload
DistCom Job Flow HTCondor Job Flow
(2)
DistCom
Manager
User
⑥Result
②Command
DistCom Service Daemon
(Server)
DistCom Service Daemon
(Server)
(1.1) DistCom
Service Daemon
(Distributed Computer – A )
O
O
O
O
O
O
O
③Source
⑤Binary
④Source
Compilation
…
O: Object
: Atomic unit of
checkpoint/restart
• (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed
degradation, where object files are the atomic level for check pointing.
O
O
O
O
O
O
O
O
4-core CPU4 Commands
O
O
O
O
O
One PC1 Objects
O
O
O O
2-core CPU2 Commands
Existing Technique Proposed Technique
11/22
CPU Scheduling: Controlling Remote PC Resources
• To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service
Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a
task of lowest priority (Time-sharing method)
Lowest Below
Normal
Normal Above
Normal
Highest Real-time
Scheduler
Multi-core processor(s)
Case1 by DistCom
Service Daemon
#include <windows.h>
#include <pthread.h>
#include <semaphore.h>
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <vxworks.h>
#include <sysLib.h>
#include <taskLib.h>
#include <semlib.h>
CreateThread()
pthread_create()
taskSpawn()
1. 0 (Highest) ~ 255 (Lowest)
taskPrioritySet( )
1. THREAD_PRIORITY_TIME_CRITICAL = 15
2. THREAD_PRIORITY_HIGHEST = +2
3. THREAD_PRIORITY_ABOVE_NORMAL = +1
4. THREAD_PRIORITY_NORMAL = 0
5. THREAD_PRIORITY_BELOW_NORMAL = -1
6. THREAD_PRIORITY_LOWEST = -2
7. THREAD_PRIORITY_IDLE = -15
SetThreadPriority()
1. -20 (Highest) ~ 19 (Lowest)
2. 1 ( Lowest) ~ 99 ( Highest)  Real-time Priority
setpriority( )
Lowest
POSIX: pthread_setschedparam()
#include <thread.h> thr_create() thr_setprio( )
1. 0 (Lowest) ~ 127 (Highest)
Case2 by DistCom
Service Daemon
Time-sharing Real-time
CaseStudyUser-AwareScheduling
• No modification of distributed computer systems
Lowest
12/22
Dedicated Resources
Shared Resources
(2)
DistCom
Manager
Reject
[Task queue]
Stop
FinishTask flow
Job flow ※ Minimal job unit : Object file
CPU Scheduling: Task Allocation & Reallocation
[Task State Transition]
• (2) DistCom Manager manages all jobs with two task queues to separate either
dedicated resources or shared resources.
1. First, Reject is used to deny the allocation of the task.
2. Second, Stop is used to break the allocation of the task to the PC resource
because of the user’s access.
3. Finally, Finish is used to complete the running tasks normally.
13/22
1. Overload detection
if (Qsum > CPUfree ) then find another idle computer
2. Task complexity estimation
if (CPUfree is unknown or (CPUaccess > CPUidle)) then
Recalculate task complexity of distributed computes (Ccomplexity)
Run retry mechanism
Call task state transition (stop)
Run object-file based compilation at the another idle computer
3. Handling of user access
if (Uaccess && DedicatedResouceScheduling )
Call Retry mechanism
if (Uaccess && SharedResouceScheduling )
Change scheduling priority from highest to lowest
CPU Scheduling: Task Allocation & Reallocation
[Retry mechanism]
Q: Queue
C: Calculation
U: User
The proposed system supports the retry mechanism that executes the
recompilation based on the object file units, whenever compilation failure of a
distributed PC occurs during the distributed compilation.
14/22
Cross Compiling for Heterogeneous Architecture
Cross-compilation Infrastructure for heterogeneous devices
X86 Windows
32bit/64bit
X86 Linux
32bit/64bit
ARM Android
32bit (V7)
• Cross-compiler infrastructure for generating executable binary files for a system other than
the one on which the compiler is running
• Heterogeneous CPU Mapper connects a source code up to the target machine code after
probing OS.
Hardware
Machine code
ToolChain
AssemblerInstructionSet
Source Code
(C, C++, Objective-C, JAVA)
Heterogeneous CPU Mapper
Compiler
(GCC)
Linker
(LD)
Debugger
(GDB)
Build cross-binutils
Build cross-gcc
BuildLinuxAPI
headers
Build c-library
(glibc, bionic)
Build cross-gcc-
hosted
Buildtools
15/22
Evaluation
User (CentOS6): 115.145.170.xxx
Distributed PC Resources
Remote PC (Windows 7): 115.145.170.xxx
Remote PC (Ubuntu 12.04): 115.145.170.xxx
16/22
Evaluation – Build Time of Platform Source
51 minutes18 minutes
BeforeAfter(ProposedSystem)
Start
Time
End
End
33 minutes (Reduced Time)
• Time cost to build the mobile platform source is
reduced by 65 percent (33 minutes).
• 25% is consumed by the Network Speed, 30%
by the Computing Power of PCs, and 45% by
the CPU Scheduling Method.
9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)
17/22
Evaluation – Compilation Speed with Distributed PCs
• Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of
network speed and low computing power of the distributed PCs.
• Compilation processing performance of the shared resource scheduling method largely depends
on the CPU usage of the PC resource compared with the dedicated resource scheduling method.
8-cores
8-cores
8-cores
8-cores
8-cores
8-cores
High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory
* network speed, low
computing power
18/22
Evaluation – Experimental Result on Cloud Computing
• Proposed system is as effective as one high-performance computer (40-core).
• 3 minutes difference in performance is caused by the emulation operation of the
KVM.
3minutes
40cores
40cores
40cores
40cores
40cores
40cores
40cores
High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)
19/22
Evaluation – With Ccache VS. Without Ccache
• Reduced compilation time of dedicated resource scheduling is by about 10%.
• Ccache effect (Dedicated) is correlated with the memory shortage of distributed
PC resources and with the physical memory capacity for caching
10%
20/22
Comparison Between Existing System and DistCom
Ccache Distcc HTCondor BOINC DistCom (*)
Domain Caching Output of
Compilation
Distributed
Computing
Distributed
Parallelization
Distributed
Computing
Distributed Computing
Task Compile Source Compile Source Run Binary File Run Binary File Compile Source &
Run Binary File
Goal High Performance High Performance High Throughput High Throughput Hybrid Computing
Pros. -Performance
Acceleration (e.g.
DB, web-service)
-Reduce Build-
Time (e.g. Android,
Linux)
-Utilize Extra
Resource
Management
-Support CPU &
GPU
- Multicore-Aware
Object-Based Unit
- Retry Mechanism
- Shared Scheduling
Cons. -Need Sufficient
Physical Memory
-Need additional
H/W
-No Distributed
Compiling
-Only Use Idle
Time
-Depend on Network
Infrastructure
Cost High High Low Low Low
User Platform Builders Platform Builders Scientific Research Scientific Research Platform Builders
Scientific Research
21/22
Conclusion
• Idle computer resources connected by a network are more
ubiquitous than ever before. (e.g. cloud environment, BYOD
environment, and generalization of computer usage)
• DistCom (DIStributed COMpilation system) support high-
speed software compilation.
– 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of
Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch.
– Hybrid Approach For Mobile platform builders, cloud developers, Grid
researchers, computational physics, and Statistics.
• The drastic improvement of compilation speeds using
existing idle PC resources.
22/22
Thank you for your attention.
Any questions?
23/22
1. Who cares about Distcc/HTCondor based system?
Can you do it for mobile devices?
2. Sounds too good. Are there any limitations?
3. Are you going to release it? Or is it a one of talk?
4. I totally don’t get why you are doing this?
FAQ
24/22
1. This approach is distributed PC based software
solution. But, some of the small companies do not
have sufficient distributed computer resources.
2. Users needs to run local area network to get the ideal
network speed.
3. Can you always uses idle PCs in real environment?
We focus on the research of public computer
facilities, which have a high percentage of idle time.
Limitation

Mais conteúdo relacionado

Mais procurados

Win8 architecture for developers
Win8 architecture for developersWin8 architecture for developers
Win8 architecture for developers
Robert MacLean
 
Unix memory management
Unix memory managementUnix memory management
Unix memory management
Tech_MX
 
Processes in unix
Processes in unixProcesses in unix
Processes in unix
miau_max
 

Mais procurados (20)

linux kernel overview 2013
linux kernel overview 2013linux kernel overview 2013
linux kernel overview 2013
 
Chen Haibo
Chen HaiboChen Haibo
Chen Haibo
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Win8 architecture for developers
Win8 architecture for developersWin8 architecture for developers
Win8 architecture for developers
 
Unix memory management
Unix memory managementUnix memory management
Unix memory management
 
Process and Threads in Linux - PPT
Process and Threads in Linux - PPTProcess and Threads in Linux - PPT
Process and Threads in Linux - PPT
 
Kernel module programming
Kernel module programmingKernel module programming
Kernel module programming
 
Improving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuardImproving Real-Time Performance on Multicore Platforms using MemGuard
Improving Real-Time Performance on Multicore Platforms using MemGuard
 
Xen and the art of virtualization
Xen and the art of virtualizationXen and the art of virtualization
Xen and the art of virtualization
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Oct2009
Oct2009Oct2009
Oct2009
 
EMC Dteata domain advanced command troubleshoot
EMC Dteata domain advanced command troubleshootEMC Dteata domain advanced command troubleshoot
EMC Dteata domain advanced command troubleshoot
 
XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix
XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, CitrixXPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix
XPDDS18: Qemu and Xen: Reducing the attack surface - Paul Durrant, Citrix
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Processes in unix
Processes in unixProcesses in unix
Processes in unix
 
DTraceCloud2012
DTraceCloud2012DTraceCloud2012
DTraceCloud2012
 
Visual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & VirtualisationVisual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & Virtualisation
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Introduction to char device driver
Introduction to char device driverIntroduction to char device driver
Introduction to char device driver
 
The Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted CoresThe Linux Scheduler: a Decade of Wasted Cores
The Linux Scheduler: a Decade of Wasted Cores
 

Destaque

Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1
Simon Kim
 

Destaque (14)

Dcc Ppt
Dcc PptDcc Ppt
Dcc Ppt
 
glideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor InternalsglideinWMS Training 2014 - HTCondor Internals
glideinWMS Training 2014 - HTCondor Internals
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014Introduction to Distributed HTC and overlay systems - OSG User School 2014
Introduction to Distributed HTC and overlay systems - OSG User School 2014
 
Using ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experienceUsing ssh as portal - The CMS CRAB over glideinWMS experience
Using ssh as portal - The CMS CRAB over glideinWMS experience
 
How is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondorHow is glideinWMS different from vanilla HTCondor
How is glideinWMS different from vanilla HTCondor
 
Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014Introduction to security in the Open Science Grid - OSG School 2014
Introduction to security in the Open Science Grid - OSG School 2014
 
glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014glideinWMS, The OSG overlay DHTC system - OSG School 2014
glideinWMS, The OSG overlay DHTC system - OSG School 2014
 
Understanding priorities in HTCondor
Understanding priorities in HTCondorUnderstanding priorities in HTCondor
Understanding priorities in HTCondor
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
 
Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014Where to find DHTC resources - OSG School 2014
Where to find DHTC resources - OSG School 2014
 
Presentation 15 condor-v1
Presentation 15 condor-v1Presentation 15 condor-v1
Presentation 15 condor-v1
 
Augmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with NirvanaAugmenting Big Data Analytics with Nirvana
Augmenting Big Data Analytics with Nirvana
 
Known HTCondor break points
Known HTCondor break pointsKnown HTCondor break points
Known HTCondor break points
 

Semelhante a distcom-short-20140112-1600

Desktop to Cloud Transformation Planning
Desktop to Cloud Transformation PlanningDesktop to Cloud Transformation Planning
Desktop to Cloud Transformation Planning
Phearin Sok
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptx
BiHongPhc
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
Sri Prasanna
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster PresentationRunning Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
broekemaa
 

Semelhante a distcom-short-20140112-1600 (20)

Desktop to Cloud Transformation Planning
Desktop to Cloud Transformation PlanningDesktop to Cloud Transformation Planning
Desktop to Cloud Transformation Planning
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
ClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptxClickOS_EE80777777777777777777777777777.pptx
ClickOS_EE80777777777777777777777777777.pptx
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Linux introduction
Linux introductionLinux introduction
Linux introduction
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud services
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Bullx HPC eXtreme computing technology
Bullx HPC eXtreme computing technologyBullx HPC eXtreme computing technology
Bullx HPC eXtreme computing technology
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
VMworld 2013: VMware Mirage Storage and Network Deduplication, DEMYSTIFIED
VMworld 2013: VMware Mirage Storage and Network Deduplication, DEMYSTIFIED VMworld 2013: VMware Mirage Storage and Network Deduplication, DEMYSTIFIED
VMworld 2013: VMware Mirage Storage and Network Deduplication, DEMYSTIFIED
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Io sy.stemppt
Io sy.stempptIo sy.stemppt
Io sy.stemppt
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster PresentationRunning Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentation
 

Mais de Samsung Electronics (8)

Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)
 
Distributed Build to Speed-up Compilation of Tizen Package
Distributed Build to Speed-up Compilation of Tizen PackageDistributed Build to Speed-up Compilation of Tizen Package
Distributed Build to Speed-up Compilation of Tizen Package
 
load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
 
kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340
 
gcce-uapm-slide-20131001-1900
gcce-uapm-slide-20131001-1900gcce-uapm-slide-20131001-1900
gcce-uapm-slide-20131001-1900
 
UNAS-20140123-1800
UNAS-20140123-1800UNAS-20140123-1800
UNAS-20140123-1800
 
Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
 

distcom-short-20140112-1600

  • 1. Geunsik Lim http://leemgs.fedorapeople.org 8/11/2016 9:35 AM Distributed Compilation System for High-Speed Software Build Processes
  • 2. 2/22 • Full name: Geunsik Lim • E-mail : leemgs@skku.edu, geunsik.lim@samsung.com • Affiliation : Sungkyunkwan University, Samsung Electronics • Homepage : http://leemgs.fedorapeople.org Who am I ?
  • 3. 3/22 Introduction Background Design and Implementation Object File Based Server & Client Model CPU Scheduling of Distributed PC Resource Cross Compiling for Heterogeneous CPU Architecture Evaluation Conclusion Outline
  • 4. 4/22 Stat. of distributed client PCs • Studies into high performance computing still lack research of public computer facilities, which have a lot of idle times. • Example of used and idle times of 300 dual-core PCs in a university library public computer facilities, which have a lot of idle times
  • 5. 5/22 What is covered in this talk? • Current state of source size growth of software from January 2009 until July 2013. Sevenfold
  • 6. 6/22 Cost Statistics for Building Platform Source • Time cost needed to build a large mobile platform such as Android 4.2.2. Compilation costs account for 67 percentages (34 minutes) of the total cost of execution (51 minutes).
  • 7. 7/22 IdleComputers What is our final goal? In unity there is strength.
  • 8. 8/22 Distributed compiler network: to distribute compiling tasks across a network Support portable Linux system for Windows PCs: for distributed compil er network using the existing HTCond or pool of Windows PCs Establish remote command-line envi ronment: HTCondor does on the libra ry PC must work without any user inte raction, so no GUI. What is our challenge? Requirement: We can NOT use a GUI installer, as you will not be sitting in front of the distributed PC when users doing their work. Windows XP Windows 7 Linux/
  • 9. 9/22 System Architecture of DistCom HTCondor Pool Manager HTCondor Client (3) Cross-Compiler Infrastructure (1.1) DistCom Service Daemon (=DistCom Server) Legend:DistComComponents X86 Windows X86 Linux ARM Android Distributed Resources Cloud Developers Scientific Researchers (2)DistCom Manager (1.2) DistCom Client OperatingSystemof DistributedPCs(Windows) Users Platform Builders Workload Monitor HTCondor Collector Resource Manager • Distributed server and client model: to control distributed PC resources connected via network • DistCom manager: for scheduling distributed PC resources • Cross-Compiler infrastructure: to support heterogeneous architecture Source Codes Server & Client Model
  • 10. 10/22 Object File Based Server & Client Model (1.2) DistCom Client HTCondor Pool Manager HTCondor PC (Windows) ①Check of PC’s status • Collecting Information • Monitoring Workload DistCom Job Flow HTCondor Job Flow (2) DistCom Manager User ⑥Result ②Command DistCom Service Daemon (Server) DistCom Service Daemon (Server) (1.1) DistCom Service Daemon (Distributed Computer – A ) O O O O O O O ③Source ⑤Binary ④Source Compilation … O: Object : Atomic unit of checkpoint/restart • (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed degradation, where object files are the atomic level for check pointing. O O O O O O O O 4-core CPU4 Commands O O O O O One PC1 Objects O O O O 2-core CPU2 Commands Existing Technique Proposed Technique
  • 11. 11/22 CPU Scheduling: Controlling Remote PC Resources • To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a task of lowest priority (Time-sharing method) Lowest Below Normal Normal Above Normal Highest Real-time Scheduler Multi-core processor(s) Case1 by DistCom Service Daemon #include <windows.h> #include <pthread.h> #include <semaphore.h> #include <sys/time.h> #include <time.h> #include <stdio.h> #include <vxworks.h> #include <sysLib.h> #include <taskLib.h> #include <semlib.h> CreateThread() pthread_create() taskSpawn() 1. 0 (Highest) ~ 255 (Lowest) taskPrioritySet( ) 1. THREAD_PRIORITY_TIME_CRITICAL = 15 2. THREAD_PRIORITY_HIGHEST = +2 3. THREAD_PRIORITY_ABOVE_NORMAL = +1 4. THREAD_PRIORITY_NORMAL = 0 5. THREAD_PRIORITY_BELOW_NORMAL = -1 6. THREAD_PRIORITY_LOWEST = -2 7. THREAD_PRIORITY_IDLE = -15 SetThreadPriority() 1. -20 (Highest) ~ 19 (Lowest) 2. 1 ( Lowest) ~ 99 ( Highest)  Real-time Priority setpriority( ) Lowest POSIX: pthread_setschedparam() #include <thread.h> thr_create() thr_setprio( ) 1. 0 (Lowest) ~ 127 (Highest) Case2 by DistCom Service Daemon Time-sharing Real-time CaseStudyUser-AwareScheduling • No modification of distributed computer systems Lowest
  • 12. 12/22 Dedicated Resources Shared Resources (2) DistCom Manager Reject [Task queue] Stop FinishTask flow Job flow ※ Minimal job unit : Object file CPU Scheduling: Task Allocation & Reallocation [Task State Transition] • (2) DistCom Manager manages all jobs with two task queues to separate either dedicated resources or shared resources. 1. First, Reject is used to deny the allocation of the task. 2. Second, Stop is used to break the allocation of the task to the PC resource because of the user’s access. 3. Finally, Finish is used to complete the running tasks normally.
  • 13. 13/22 1. Overload detection if (Qsum > CPUfree ) then find another idle computer 2. Task complexity estimation if (CPUfree is unknown or (CPUaccess > CPUidle)) then Recalculate task complexity of distributed computes (Ccomplexity) Run retry mechanism Call task state transition (stop) Run object-file based compilation at the another idle computer 3. Handling of user access if (Uaccess && DedicatedResouceScheduling ) Call Retry mechanism if (Uaccess && SharedResouceScheduling ) Change scheduling priority from highest to lowest CPU Scheduling: Task Allocation & Reallocation [Retry mechanism] Q: Queue C: Calculation U: User The proposed system supports the retry mechanism that executes the recompilation based on the object file units, whenever compilation failure of a distributed PC occurs during the distributed compilation.
  • 14. 14/22 Cross Compiling for Heterogeneous Architecture Cross-compilation Infrastructure for heterogeneous devices X86 Windows 32bit/64bit X86 Linux 32bit/64bit ARM Android 32bit (V7) • Cross-compiler infrastructure for generating executable binary files for a system other than the one on which the compiler is running • Heterogeneous CPU Mapper connects a source code up to the target machine code after probing OS. Hardware Machine code ToolChain AssemblerInstructionSet Source Code (C, C++, Objective-C, JAVA) Heterogeneous CPU Mapper Compiler (GCC) Linker (LD) Debugger (GDB) Build cross-binutils Build cross-gcc BuildLinuxAPI headers Build c-library (glibc, bionic) Build cross-gcc- hosted Buildtools
  • 15. 15/22 Evaluation User (CentOS6): 115.145.170.xxx Distributed PC Resources Remote PC (Windows 7): 115.145.170.xxx Remote PC (Ubuntu 12.04): 115.145.170.xxx
  • 16. 16/22 Evaluation – Build Time of Platform Source 51 minutes18 minutes BeforeAfter(ProposedSystem) Start Time End End 33 minutes (Reduced Time) • Time cost to build the mobile platform source is reduced by 65 percent (33 minutes). • 25% is consumed by the Network Speed, 30% by the Computing Power of PCs, and 45% by the CPU Scheduling Method. 9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)
  • 17. 17/22 Evaluation – Compilation Speed with Distributed PCs • Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of network speed and low computing power of the distributed PCs. • Compilation processing performance of the shared resource scheduling method largely depends on the CPU usage of the PC resource compared with the dedicated resource scheduling method. 8-cores 8-cores 8-cores 8-cores 8-cores 8-cores High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory * network speed, low computing power
  • 18. 18/22 Evaluation – Experimental Result on Cloud Computing • Proposed system is as effective as one high-performance computer (40-core). • 3 minutes difference in performance is caused by the emulation operation of the KVM. 3minutes 40cores 40cores 40cores 40cores 40cores 40cores 40cores High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)
  • 19. 19/22 Evaluation – With Ccache VS. Without Ccache • Reduced compilation time of dedicated resource scheduling is by about 10%. • Ccache effect (Dedicated) is correlated with the memory shortage of distributed PC resources and with the physical memory capacity for caching 10%
  • 20. 20/22 Comparison Between Existing System and DistCom Ccache Distcc HTCondor BOINC DistCom (*) Domain Caching Output of Compilation Distributed Computing Distributed Parallelization Distributed Computing Distributed Computing Task Compile Source Compile Source Run Binary File Run Binary File Compile Source & Run Binary File Goal High Performance High Performance High Throughput High Throughput Hybrid Computing Pros. -Performance Acceleration (e.g. DB, web-service) -Reduce Build- Time (e.g. Android, Linux) -Utilize Extra Resource Management -Support CPU & GPU - Multicore-Aware Object-Based Unit - Retry Mechanism - Shared Scheduling Cons. -Need Sufficient Physical Memory -Need additional H/W -No Distributed Compiling -Only Use Idle Time -Depend on Network Infrastructure Cost High High Low Low Low User Platform Builders Platform Builders Scientific Research Scientific Research Platform Builders Scientific Research
  • 21. 21/22 Conclusion • Idle computer resources connected by a network are more ubiquitous than ever before. (e.g. cloud environment, BYOD environment, and generalization of computer usage) • DistCom (DIStributed COMpilation system) support high- speed software compilation. – 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch. – Hybrid Approach For Mobile platform builders, cloud developers, Grid researchers, computational physics, and Statistics. • The drastic improvement of compilation speeds using existing idle PC resources.
  • 22. 22/22 Thank you for your attention. Any questions?
  • 23. 23/22 1. Who cares about Distcc/HTCondor based system? Can you do it for mobile devices? 2. Sounds too good. Are there any limitations? 3. Are you going to release it? Or is it a one of talk? 4. I totally don’t get why you are doing this? FAQ
  • 24. 24/22 1. This approach is distributed PC based software solution. But, some of the small companies do not have sufficient distributed computer resources. 2. Users needs to run local area network to get the ideal network speed. 3. Can you always uses idle PCs in real environment? We focus on the research of public computer facilities, which have a high percentage of idle time. Limitation

Notas do Editor

  1. 논문발표시간은 30분입니다. (토의시간 10분 포함) *Before: Distributed Compilation System Using HTCondor and Distcc for High-Speed Software Compilation *After: Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation Developer, Testing, Release and CI Automation miniconf @ linux.conf.au 2014 Organiser: Stewart Smith <stewart@flamingspork.com> Blog: https://www.flamingspork.com/blog/
  2. Thread scheduling framework
  3. Source - http://condor.skku.edu/Benchmarks/idletimes.html
  4. In unity there is strength. (친구) 뭉치면 산다. 함께 모이면 힘이 생긴다.
  5. The checkpoint/restart is the ability to save the state of a compiling source code so that compilation can later resume on the same or a different distributed PC from the moment at which its checkpoint was carried out.
  6. Q: Queue C: Calculation U: User
  7. Q: Queue C: Calculation U: User
  8. http://distcc.googlecode.com/svn/trunk/doc/web/benchmark.html
  9. Theoretically the performance of 8 machines would be similar to the 8-Core PC
  10. * Sources: http://opensource.samsung.com * Real-Time for Linux native applications Default Thread model : NPTL Priority Queueing for Mutex/Semaphore Priority Inheritance Mutex Robust Mutex
  11. 실험 데이터 추가 dedicated resource scheduling vs. enhanced dedicated resource scheduling multicore-aware work allocation vs. the number of pc work allocation (the minimum number of the object files is equal to the number of available CPUs ) 2. checkpoint-restart experiment - object file unit based checkpoint-restart vs. mission-based method 3. User-aware CPU resource scheduling of sharing method. by probing user-access with time-sharing(lowest priority) and real-time(real-time priority) for dedicated resource scheduling  real-time(real-time priority for shared resource scheduling  time-sharing(lowest priority) Future work - a network-aware task scheduling technique considering the physical network speed [27] to distribute tasks - a task migration algorithm [11] to migrate distributed tasks to another idle PC resource.
  12. Here, I prepared FAQ for the audience.
  13. Here is limitation of our proposed system.