distcom-short-20140112-1600

Geunsik Lim
http://leemgs.fedorapeople.org
8/11/2016 9:35 AM
Distributed Compilation System for
High-Speed Software Build Processes

2/22
• Full name: Geunsik Lim
• E-mail : leemgs@skku.edu, geunsik.lim@samsung.com
• Affiliation : Sungkyunkwan University, Samsung Electronics
• Homepage : http://leemgs.fedorapeople.org
Who am I ?

3/22
Introduction
Background
Design and Implementation
Object File Based Server & Client Model
CPU Scheduling of Distributed PC Resource
Cross Compiling for Heterogeneous CPU Architecture
Evaluation
Conclusion
Outline

4/22
Stat. of distributed client PCs
• Studies into high performance computing still lack research of public computer
facilities, which have a lot of idle times.
• Example of used and idle times of 300 dual-core PCs in a university library
public computer facilities,
which have a lot of idle times

5/22
What is covered in this talk?
• Current state of source size growth of software from January 2009 until July 2013.
Sevenfold

6/22
Cost Statistics for Building Platform Source
• Time cost needed to build a large mobile platform such as Android 4.2.2.
Compilation costs account for 67 percentages (34 minutes) of the total cost of
execution (51 minutes).

7/22
IdleComputers
What is our final goal?
In unity there is strength.

8/22
Distributed compiler network: to
distribute compiling tasks across a
network
Support portable Linux system for
Windows PCs: for distributed compil
er network using the existing HTCond
or pool of Windows PCs
Establish remote command-line envi
ronment: HTCondor does on the libra
ry PC must work without any user inte
raction, so no GUI.
What is our challenge?
Requirement: We can
NOT use a GUI installer, as
you will not be sitting in
front of the distributed PC
when users doing their work.
Windows XP
Windows 7
Linux/

9/22
System Architecture of DistCom
HTCondor Pool Manager
HTCondor Client
(3) Cross-Compiler Infrastructure
(1.1) DistCom Service Daemon
(=DistCom Server)
Legend:DistComComponents
X86 Windows X86 Linux ARM Android
Distributed
Resources
Cloud
Developers
Scientific
Researchers
(2)DistCom
Manager
(1.2) DistCom Client
OperatingSystemof
DistributedPCs(Windows)
Users
Platform
Builders
Workload
Monitor
HTCondor Collector
Resource
Manager
• Distributed server and client model: to control distributed PC resources connected via network
• DistCom manager: for scheduling distributed PC resources
• Cross-Compiler infrastructure: to support heterogeneous architecture
Source Codes
Server & Client Model

10/22
Object File Based Server & Client Model
(1.2) DistCom
Client
HTCondor
Pool
Manager
HTCondor
PC
(Windows)
①Check of PC’s status
• Collecting Information
• Monitoring Workload
DistCom Job Flow HTCondor Job Flow
(2)
DistCom
Manager
User
⑥Result
②Command
DistCom Service Daemon
(Server)
DistCom Service Daemon
(Server)
(1.1) DistCom
Service Daemon
(Distributed Computer – A )
O
O
O
O
O
O
O
③Source
⑤Binary
④Source
Compilation
…
O: Object
: Atomic unit of
checkpoint/restart
• (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed
degradation, where object files are the atomic level for check pointing.
O
O
O
O
O
O
O
O
4-core CPU4 Commands
O
O
O
O
O
One PC1 Objects
O
O
O O
2-core CPU2 Commands
Existing Technique Proposed Technique

11/22
CPU Scheduling: Controlling Remote PC Resources
• To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service
Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a
task of lowest priority (Time-sharing method)
Lowest Below
Normal
Normal Above
Normal
Highest Real-time
Scheduler
Multi-core processor(s)
Case1 by DistCom
Service Daemon
#include <windows.h>
#include <pthread.h>
#include <semaphore.h>
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <vxworks.h>
#include <sysLib.h>
#include <taskLib.h>
#include <semlib.h>
CreateThread()
pthread_create()
taskSpawn()
1. 0 (Highest) ~ 255 (Lowest)
taskPrioritySet( )
1. THREAD_PRIORITY_TIME_CRITICAL = 15
2. THREAD_PRIORITY_HIGHEST = +2
3. THREAD_PRIORITY_ABOVE_NORMAL = +1
4. THREAD_PRIORITY_NORMAL = 0
5. THREAD_PRIORITY_BELOW_NORMAL = -1
6. THREAD_PRIORITY_LOWEST = -2
7. THREAD_PRIORITY_IDLE = -15
SetThreadPriority()
1. -20 (Highest) ~ 19 (Lowest)
2. 1 ( Lowest) ~ 99 ( Highest)  Real-time Priority
setpriority( )
Lowest
POSIX: pthread_setschedparam()
#include <thread.h> thr_create() thr_setprio( )
1. 0 (Lowest) ~ 127 (Highest)
Case2 by DistCom
Service Daemon
Time-sharing Real-time
CaseStudyUser-AwareScheduling
• No modification of distributed computer systems
Lowest

12/22
Dedicated Resources
Shared Resources
(2)
DistCom
Manager
Reject
[Task queue]
Stop
FinishTask flow
Job flow ※ Minimal job unit : Object file
CPU Scheduling: Task Allocation & Reallocation
[Task State Transition]
• (2) DistCom Manager manages all jobs with two task queues to separate either
dedicated resources or shared resources.
1. First, Reject is used to deny the allocation of the task.
2. Second, Stop is used to break the allocation of the task to the PC resource
because of the user’s access.
3. Finally, Finish is used to complete the running tasks normally.

13/22
1. Overload detection
if (Qsum > CPUfree ) then find another idle computer
2. Task complexity estimation
if (CPUfree is unknown or (CPUaccess > CPUidle)) then
Recalculate task complexity of distributed computes (Ccomplexity)
Run retry mechanism
Call task state transition (stop)
Run object-file based compilation at the another idle computer
3. Handling of user access
if (Uaccess && DedicatedResouceScheduling )
Call Retry mechanism
if (Uaccess && SharedResouceScheduling )
Change scheduling priority from highest to lowest
CPU Scheduling: Task Allocation & Reallocation
[Retry mechanism]
Q: Queue
C: Calculation
U: User
The proposed system supports the retry mechanism that executes the
recompilation based on the object file units, whenever compilation failure of a
distributed PC occurs during the distributed compilation.

14/22
Cross Compiling for Heterogeneous Architecture
Cross-compilation Infrastructure for heterogeneous devices
X86 Windows
32bit/64bit
X86 Linux
32bit/64bit
ARM Android
32bit (V7)
• Cross-compiler infrastructure for generating executable binary files for a system other than
the one on which the compiler is running
• Heterogeneous CPU Mapper connects a source code up to the target machine code after
probing OS.
Hardware
Machine code
ToolChain
AssemblerInstructionSet
Source Code
(C, C++, Objective-C, JAVA)
Heterogeneous CPU Mapper
Compiler
(GCC)
Linker
(LD)
Debugger
(GDB)
Build cross-binutils
Build cross-gcc
BuildLinuxAPI
headers
Build c-library
(glibc, bionic)
Build cross-gcc-
hosted
Buildtools

15/22
Evaluation
User (CentOS6): 115.145.170.xxx
Distributed PC Resources
Remote PC (Windows 7): 115.145.170.xxx
Remote PC (Ubuntu 12.04): 115.145.170.xxx

16/22
Evaluation – Build Time of Platform Source
51 minutes18 minutes
BeforeAfter(ProposedSystem)
Start
Time
End
End
33 minutes (Reduced Time)
• Time cost to build the mobile platform source is
reduced by 65 percent (33 minutes).
• 25% is consumed by the Network Speed, 30%
by the Computing Power of PCs, and 45% by
the CPU Scheduling Method.
9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)

17/22
Evaluation – Compilation Speed with Distributed PCs
• Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of
network speed and low computing power of the distributed PCs.
• Compilation processing performance of the shared resource scheduling method largely depends
on the CPU usage of the PC resource compared with the dedicated resource scheduling method.
8-cores
8-cores
8-cores
8-cores
8-cores
8-cores
High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory
* network speed, low
computing power

18/22
Evaluation – Experimental Result on Cloud Computing
• Proposed system is as effective as one high-performance computer (40-core).
• 3 minutes difference in performance is caused by the emulation operation of the
KVM.
3minutes
40cores
40cores
40cores
40cores
40cores
40cores
40cores
High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)

19/22
Evaluation – With Ccache VS. Without Ccache
• Reduced compilation time of dedicated resource scheduling is by about 10%.
• Ccache effect (Dedicated) is correlated with the memory shortage of distributed
PC resources and with the physical memory capacity for caching
10%

20/22
Comparison Between Existing System and DistCom
Ccache Distcc HTCondor BOINC DistCom (*)
Domain Caching Output of
Compilation
Distributed
Computing
Distributed
Parallelization
Distributed
Computing
Distributed Computing
Task Compile Source Compile Source Run Binary File Run Binary File Compile Source &
Run Binary File
Goal High Performance High Performance High Throughput High Throughput Hybrid Computing
Pros. -Performance
Acceleration (e.g.
DB, web-service)
-Reduce Build-
Time (e.g. Android,
Linux)
-Utilize Extra
Resource
Management
-Support CPU &
GPU
- Multicore-Aware
Object-Based Unit
- Retry Mechanism
- Shared Scheduling
Cons. -Need Sufficient
Physical Memory
-Need additional
H/W
-No Distributed
Compiling
-Only Use Idle
Time
-Depend on Network
Infrastructure
Cost High High Low Low Low
User Platform Builders Platform Builders Scientific Research Scientific Research Platform Builders
Scientific Research

21/22
Conclusion
• Idle computer resources connected by a network are more
ubiquitous than ever before. (e.g. cloud environment, BYOD
environment, and generalization of computer usage)
• DistCom (DIStributed COMpilation system) support high-
speed software compilation.
– 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of
Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch.
– Hybrid Approach For Mobile platform builders, cloud developers, Grid
researchers, computational physics, and Statistics.
• The drastic improvement of compilation speeds using
existing idle PC resources.

22/22
Thank you for your attention.
Any questions?

23/22
1. Who cares about Distcc/HTCondor based system?
Can you do it for mobile devices?
2. Sounds too good. Are there any limitations?
3. Are you going to release it? Or is it a one of talk?
4. I totally don’t get why you are doing this?
FAQ

24/22
1. This approach is distributed PC based software
solution. But, some of the small companies do not
have sufficient distributed computer resources.
2. Users needs to run local area network to get the ideal
network speed.
3. Can you always uses idle PCs in real environment?
We focus on the research of public computer
facilities, which have a high percentage of idle time.
Limitation

distcom-short-20140112-1600

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (14)

Semelhante a distcom-short-20140112-1600

Semelhante a distcom-short-20140112-1600 (20)

Mais de Samsung Electronics

Mais de Samsung Electronics (8)

distcom-short-20140112-1600

Notas do Editor