4. 4/22
Stat. of distributed client PCs
• Studies into high performance computing still lack research of public computer
facilities, which have a lot of idle times.
• Example of used and idle times of 300 dual-core PCs in a university library
public computer facilities,
which have a lot of idle times
5. 5/22
What is covered in this talk?
• Current state of source size growth of software from January 2009 until July 2013.
Sevenfold
6. 6/22
Cost Statistics for Building Platform Source
• Time cost needed to build a large mobile platform such as Android 4.2.2.
Compilation costs account for 67 percentages (34 minutes) of the total cost of
execution (51 minutes).
8. 8/22
Distributed compiler network: to
distribute compiling tasks across a
network
Support portable Linux system for
Windows PCs: for distributed compil
er network using the existing HTCond
or pool of Windows PCs
Establish remote command-line envi
ronment: HTCondor does on the libra
ry PC must work without any user inte
raction, so no GUI.
What is our challenge?
Requirement: We can
NOT use a GUI installer, as
you will not be sitting in
front of the distributed PC
when users doing their work.
Windows XP
Windows 7
Linux/
9. 9/22
System Architecture of DistCom
HTCondor Pool Manager
HTCondor Client
(3) Cross-Compiler Infrastructure
(1.1) DistCom Service Daemon
(=DistCom Server)
Legend:DistComComponents
X86 Windows X86 Linux ARM Android
Distributed
Resources
Cloud
Developers
Scientific
Researchers
(2)DistCom
Manager
(1.2) DistCom Client
OperatingSystemof
DistributedPCs(Windows)
Users
Platform
Builders
Workload
Monitor
HTCondor Collector
Resource
Manager
• Distributed server and client model: to control distributed PC resources connected via network
• DistCom manager: for scheduling distributed PC resources
• Cross-Compiler infrastructure: to support heterogeneous architecture
Source Codes
Server & Client Model
10. 10/22
Object File Based Server & Client Model
(1.2) DistCom
Client
HTCondor
Pool
Manager
HTCondor
PC
(Windows)
①Check of PC’s status
• Collecting Information
• Monitoring Workload
DistCom Job Flow HTCondor Job Flow
(2)
DistCom
Manager
User
⑥Result
②Command
DistCom Service Daemon
(Server)
DistCom Service Daemon
(Server)
(1.1) DistCom
Service Daemon
(Distributed Computer – A )
O
O
O
O
O
O
O
③Source
⑤Binary
④Source
Compilation
…
O: Object
: Atomic unit of
checkpoint/restart
• (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed
degradation, where object files are the atomic level for check pointing.
O
O
O
O
O
O
O
O
4-core CPU4 Commands
O
O
O
O
O
One PC1 Objects
O
O
O O
2-core CPU2 Commands
Existing Technique Proposed Technique
11. 11/22
CPU Scheduling: Controlling Remote PC Resources
• To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service
Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a
task of lowest priority (Time-sharing method)
Lowest Below
Normal
Normal Above
Normal
Highest Real-time
Scheduler
Multi-core processor(s)
Case1 by DistCom
Service Daemon
#include <windows.h>
#include <pthread.h>
#include <semaphore.h>
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <vxworks.h>
#include <sysLib.h>
#include <taskLib.h>
#include <semlib.h>
CreateThread()
pthread_create()
taskSpawn()
1. 0 (Highest) ~ 255 (Lowest)
taskPrioritySet( )
1. THREAD_PRIORITY_TIME_CRITICAL = 15
2. THREAD_PRIORITY_HIGHEST = +2
3. THREAD_PRIORITY_ABOVE_NORMAL = +1
4. THREAD_PRIORITY_NORMAL = 0
5. THREAD_PRIORITY_BELOW_NORMAL = -1
6. THREAD_PRIORITY_LOWEST = -2
7. THREAD_PRIORITY_IDLE = -15
SetThreadPriority()
1. -20 (Highest) ~ 19 (Lowest)
2. 1 ( Lowest) ~ 99 ( Highest) Real-time Priority
setpriority( )
Lowest
POSIX: pthread_setschedparam()
#include <thread.h> thr_create() thr_setprio( )
1. 0 (Lowest) ~ 127 (Highest)
Case2 by DistCom
Service Daemon
Time-sharing Real-time
CaseStudyUser-AwareScheduling
• No modification of distributed computer systems
Lowest
12. 12/22
Dedicated Resources
Shared Resources
(2)
DistCom
Manager
Reject
[Task queue]
Stop
FinishTask flow
Job flow ※ Minimal job unit : Object file
CPU Scheduling: Task Allocation & Reallocation
[Task State Transition]
• (2) DistCom Manager manages all jobs with two task queues to separate either
dedicated resources or shared resources.
1. First, Reject is used to deny the allocation of the task.
2. Second, Stop is used to break the allocation of the task to the PC resource
because of the user’s access.
3. Finally, Finish is used to complete the running tasks normally.
13. 13/22
1. Overload detection
if (Qsum > CPUfree ) then find another idle computer
2. Task complexity estimation
if (CPUfree is unknown or (CPUaccess > CPUidle)) then
Recalculate task complexity of distributed computes (Ccomplexity)
Run retry mechanism
Call task state transition (stop)
Run object-file based compilation at the another idle computer
3. Handling of user access
if (Uaccess && DedicatedResouceScheduling )
Call Retry mechanism
if (Uaccess && SharedResouceScheduling )
Change scheduling priority from highest to lowest
CPU Scheduling: Task Allocation & Reallocation
[Retry mechanism]
Q: Queue
C: Calculation
U: User
The proposed system supports the retry mechanism that executes the
recompilation based on the object file units, whenever compilation failure of a
distributed PC occurs during the distributed compilation.
14. 14/22
Cross Compiling for Heterogeneous Architecture
Cross-compilation Infrastructure for heterogeneous devices
X86 Windows
32bit/64bit
X86 Linux
32bit/64bit
ARM Android
32bit (V7)
• Cross-compiler infrastructure for generating executable binary files for a system other than
the one on which the compiler is running
• Heterogeneous CPU Mapper connects a source code up to the target machine code after
probing OS.
Hardware
Machine code
ToolChain
AssemblerInstructionSet
Source Code
(C, C++, Objective-C, JAVA)
Heterogeneous CPU Mapper
Compiler
(GCC)
Linker
(LD)
Debugger
(GDB)
Build cross-binutils
Build cross-gcc
BuildLinuxAPI
headers
Build c-library
(glibc, bionic)
Build cross-gcc-
hosted
Buildtools
16. 16/22
Evaluation – Build Time of Platform Source
51 minutes18 minutes
BeforeAfter(ProposedSystem)
Start
Time
End
End
33 minutes (Reduced Time)
• Time cost to build the mobile platform source is
reduced by 65 percent (33 minutes).
• 25% is consumed by the Network Speed, 30%
by the Computing Power of PCs, and 45% by
the CPU Scheduling Method.
9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)
17. 17/22
Evaluation – Compilation Speed with Distributed PCs
• Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of
network speed and low computing power of the distributed PCs.
• Compilation processing performance of the shared resource scheduling method largely depends
on the CPU usage of the PC resource compared with the dedicated resource scheduling method.
8-cores
8-cores
8-cores
8-cores
8-cores
8-cores
High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory
* network speed, low
computing power
18. 18/22
Evaluation – Experimental Result on Cloud Computing
• Proposed system is as effective as one high-performance computer (40-core).
• 3 minutes difference in performance is caused by the emulation operation of the
KVM.
3minutes
40cores
40cores
40cores
40cores
40cores
40cores
40cores
High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)
19. 19/22
Evaluation – With Ccache VS. Without Ccache
• Reduced compilation time of dedicated resource scheduling is by about 10%.
• Ccache effect (Dedicated) is correlated with the memory shortage of distributed
PC resources and with the physical memory capacity for caching
10%
20. 20/22
Comparison Between Existing System and DistCom
Ccache Distcc HTCondor BOINC DistCom (*)
Domain Caching Output of
Compilation
Distributed
Computing
Distributed
Parallelization
Distributed
Computing
Distributed Computing
Task Compile Source Compile Source Run Binary File Run Binary File Compile Source &
Run Binary File
Goal High Performance High Performance High Throughput High Throughput Hybrid Computing
Pros. -Performance
Acceleration (e.g.
DB, web-service)
-Reduce Build-
Time (e.g. Android,
Linux)
-Utilize Extra
Resource
Management
-Support CPU &
GPU
- Multicore-Aware
Object-Based Unit
- Retry Mechanism
- Shared Scheduling
Cons. -Need Sufficient
Physical Memory
-Need additional
H/W
-No Distributed
Compiling
-Only Use Idle
Time
-Depend on Network
Infrastructure
Cost High High Low Low Low
User Platform Builders Platform Builders Scientific Research Scientific Research Platform Builders
Scientific Research
21. 21/22
Conclusion
• Idle computer resources connected by a network are more
ubiquitous than ever before. (e.g. cloud environment, BYOD
environment, and generalization of computer usage)
• DistCom (DIStributed COMpilation system) support high-
speed software compilation.
– 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of
Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch.
– Hybrid Approach For Mobile platform builders, cloud developers, Grid
researchers, computational physics, and Statistics.
• The drastic improvement of compilation speeds using
existing idle PC resources.
23. 23/22
1. Who cares about Distcc/HTCondor based system?
Can you do it for mobile devices?
2. Sounds too good. Are there any limitations?
3. Are you going to release it? Or is it a one of talk?
4. I totally don’t get why you are doing this?
FAQ
24. 24/22
1. This approach is distributed PC based software
solution. But, some of the small companies do not
have sufficient distributed computer resources.
2. Users needs to run local area network to get the ideal
network speed.
3. Can you always uses idle PCs in real environment?
We focus on the research of public computer
facilities, which have a high percentage of idle time.
Limitation
Notas do Editor
논문발표시간은 30분입니다. (토의시간 10분 포함)
*Before: Distributed Compilation System Using HTCondor and Distcc for High-Speed Software Compilation
*After: Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation
Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation
Developer, Testing, Release and CI Automation miniconf @ linux.conf.au 2014
Organiser: Stewart Smith <stewart@flamingspork.com>
Blog: https://www.flamingspork.com/blog/
In unity there is strength.
(친구) 뭉치면 산다. 함께 모이면 힘이 생긴다.
The checkpoint/restart is the ability to save the state of a compiling source code so that compilation can later resume on the same or a different distributed PC from the moment at which its checkpoint was carried out.
Theoretically the performance of 8 machines would be similar to the 8-Core PC
* Sources: http://opensource.samsung.com
* Real-Time for Linux native applications
Default Thread model : NPTL
Priority Queueing for Mutex/Semaphore
Priority Inheritance Mutex
Robust Mutex
실험 데이터 추가
dedicated resource scheduling vs. enhanced dedicated resource scheduling
multicore-aware work allocation vs. the number of pc work allocation
(the minimum number of the object files is equal to the number of available CPUs )
2. checkpoint-restart experiment
- object file unit based checkpoint-restart vs. mission-based method
3. User-aware CPU resource scheduling of sharing method.
by probing user-access with time-sharing(lowest priority) and real-time(real-time priority)
for dedicated resource scheduling real-time(real-time priority
for shared resource scheduling time-sharing(lowest priority)
Future work
- a network-aware task scheduling technique considering the physical network speed [27] to distribute tasks
- a task migration algorithm [11] to migrate distributed tasks to another idle PC resource.