O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×


Próximos SlideShares
Carregando em…3

Confira estes a seguir

1 de 24 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (14)


Semelhante a distcom-short-20140112-1600 (20)



  1. 1. Geunsik Lim http://leemgs.fedorapeople.org 8/11/2016 9:35 AM Distributed Compilation System for High-Speed Software Build Processes
  2. 2. 2/22 • Full name: Geunsik Lim • E-mail : leemgs@skku.edu, geunsik.lim@samsung.com • Affiliation : Sungkyunkwan University, Samsung Electronics • Homepage : http://leemgs.fedorapeople.org Who am I ?
  3. 3. 3/22 Introduction Background Design and Implementation Object File Based Server & Client Model CPU Scheduling of Distributed PC Resource Cross Compiling for Heterogeneous CPU Architecture Evaluation Conclusion Outline
  4. 4. 4/22 Stat. of distributed client PCs • Studies into high performance computing still lack research of public computer facilities, which have a lot of idle times. • Example of used and idle times of 300 dual-core PCs in a university library public computer facilities, which have a lot of idle times
  5. 5. 5/22 What is covered in this talk? • Current state of source size growth of software from January 2009 until July 2013. Sevenfold
  6. 6. 6/22 Cost Statistics for Building Platform Source • Time cost needed to build a large mobile platform such as Android 4.2.2. Compilation costs account for 67 percentages (34 minutes) of the total cost of execution (51 minutes).
  7. 7. 7/22 IdleComputers What is our final goal? In unity there is strength.
  8. 8. 8/22 Distributed compiler network: to distribute compiling tasks across a network Support portable Linux system for Windows PCs: for distributed compil er network using the existing HTCond or pool of Windows PCs Establish remote command-line envi ronment: HTCondor does on the libra ry PC must work without any user inte raction, so no GUI. What is our challenge? Requirement: We can NOT use a GUI installer, as you will not be sitting in front of the distributed PC when users doing their work. Windows XP Windows 7 Linux/
  9. 9. 9/22 System Architecture of DistCom HTCondor Pool Manager HTCondor Client (3) Cross-Compiler Infrastructure (1.1) DistCom Service Daemon (=DistCom Server) Legend:DistComComponents X86 Windows X86 Linux ARM Android Distributed Resources Cloud Developers Scientific Researchers (2)DistCom Manager (1.2) DistCom Client OperatingSystemof DistributedPCs(Windows) Users Platform Builders Workload Monitor HTCondor Collector Resource Manager • Distributed server and client model: to control distributed PC resources connected via network • DistCom manager: for scheduling distributed PC resources • Cross-Compiler infrastructure: to support heterogeneous architecture Source Codes Server & Client Model
  10. 10. 10/22 Object File Based Server & Client Model (1.2) DistCom Client HTCondor Pool Manager HTCondor PC (Windows) ①Check of PC’s status • Collecting Information • Monitoring Workload DistCom Job Flow HTCondor Job Flow (2) DistCom Manager User ⑥Result ②Command DistCom Service Daemon (Server) DistCom Service Daemon (Server) (1.1) DistCom Service Daemon (Distributed Computer – A ) O O O O O O O ③Source ⑤Binary ④Source Compilation … O: Object : Atomic unit of checkpoint/restart • (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed degradation, where object files are the atomic level for check pointing. O O O O O O O O 4-core CPU4 Commands O O O O O One PC1 Objects O O O O 2-core CPU2 Commands Existing Technique Proposed Technique
  11. 11. 11/22 CPU Scheduling: Controlling Remote PC Resources • To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a task of lowest priority (Time-sharing method) Lowest Below Normal Normal Above Normal Highest Real-time Scheduler Multi-core processor(s) Case1 by DistCom Service Daemon #include <windows.h> #include <pthread.h> #include <semaphore.h> #include <sys/time.h> #include <time.h> #include <stdio.h> #include <vxworks.h> #include <sysLib.h> #include <taskLib.h> #include <semlib.h> CreateThread() pthread_create() taskSpawn() 1. 0 (Highest) ~ 255 (Lowest) taskPrioritySet( ) 1. THREAD_PRIORITY_TIME_CRITICAL = 15 2. THREAD_PRIORITY_HIGHEST = +2 3. THREAD_PRIORITY_ABOVE_NORMAL = +1 4. THREAD_PRIORITY_NORMAL = 0 5. THREAD_PRIORITY_BELOW_NORMAL = -1 6. THREAD_PRIORITY_LOWEST = -2 7. THREAD_PRIORITY_IDLE = -15 SetThreadPriority() 1. -20 (Highest) ~ 19 (Lowest) 2. 1 ( Lowest) ~ 99 ( Highest)  Real-time Priority setpriority( ) Lowest POSIX: pthread_setschedparam() #include <thread.h> thr_create() thr_setprio( ) 1. 0 (Lowest) ~ 127 (Highest) Case2 by DistCom Service Daemon Time-sharing Real-time CaseStudyUser-AwareScheduling • No modification of distributed computer systems Lowest
  12. 12. 12/22 Dedicated Resources Shared Resources (2) DistCom Manager Reject [Task queue] Stop FinishTask flow Job flow ※ Minimal job unit : Object file CPU Scheduling: Task Allocation & Reallocation [Task State Transition] • (2) DistCom Manager manages all jobs with two task queues to separate either dedicated resources or shared resources. 1. First, Reject is used to deny the allocation of the task. 2. Second, Stop is used to break the allocation of the task to the PC resource because of the user’s access. 3. Finally, Finish is used to complete the running tasks normally.
  13. 13. 13/22 1. Overload detection if (Qsum > CPUfree ) then find another idle computer 2. Task complexity estimation if (CPUfree is unknown or (CPUaccess > CPUidle)) then Recalculate task complexity of distributed computes (Ccomplexity) Run retry mechanism Call task state transition (stop) Run object-file based compilation at the another idle computer 3. Handling of user access if (Uaccess && DedicatedResouceScheduling ) Call Retry mechanism if (Uaccess && SharedResouceScheduling ) Change scheduling priority from highest to lowest CPU Scheduling: Task Allocation & Reallocation [Retry mechanism] Q: Queue C: Calculation U: User The proposed system supports the retry mechanism that executes the recompilation based on the object file units, whenever compilation failure of a distributed PC occurs during the distributed compilation.
  14. 14. 14/22 Cross Compiling for Heterogeneous Architecture Cross-compilation Infrastructure for heterogeneous devices X86 Windows 32bit/64bit X86 Linux 32bit/64bit ARM Android 32bit (V7) • Cross-compiler infrastructure for generating executable binary files for a system other than the one on which the compiler is running • Heterogeneous CPU Mapper connects a source code up to the target machine code after probing OS. Hardware Machine code ToolChain AssemblerInstructionSet Source Code (C, C++, Objective-C, JAVA) Heterogeneous CPU Mapper Compiler (GCC) Linker (LD) Debugger (GDB) Build cross-binutils Build cross-gcc BuildLinuxAPI headers Build c-library (glibc, bionic) Build cross-gcc- hosted Buildtools
  15. 15. 15/22 Evaluation User (CentOS6): 115.145.170.xxx Distributed PC Resources Remote PC (Windows 7): 115.145.170.xxx Remote PC (Ubuntu 12.04): 115.145.170.xxx
  16. 16. 16/22 Evaluation – Build Time of Platform Source 51 minutes18 minutes BeforeAfter(ProposedSystem) Start Time End End 33 minutes (Reduced Time) • Time cost to build the mobile platform source is reduced by 65 percent (33 minutes). • 25% is consumed by the Network Speed, 30% by the Computing Power of PCs, and 45% by the CPU Scheduling Method. 9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)
  17. 17. 17/22 Evaluation – Compilation Speed with Distributed PCs • Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of network speed and low computing power of the distributed PCs. • Compilation processing performance of the shared resource scheduling method largely depends on the CPU usage of the PC resource compared with the dedicated resource scheduling method. 8-cores 8-cores 8-cores 8-cores 8-cores 8-cores High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory * network speed, low computing power
  18. 18. 18/22 Evaluation – Experimental Result on Cloud Computing • Proposed system is as effective as one high-performance computer (40-core). • 3 minutes difference in performance is caused by the emulation operation of the KVM. 3minutes 40cores 40cores 40cores 40cores 40cores 40cores 40cores High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)
  19. 19. 19/22 Evaluation – With Ccache VS. Without Ccache • Reduced compilation time of dedicated resource scheduling is by about 10%. • Ccache effect (Dedicated) is correlated with the memory shortage of distributed PC resources and with the physical memory capacity for caching 10%
  20. 20. 20/22 Comparison Between Existing System and DistCom Ccache Distcc HTCondor BOINC DistCom (*) Domain Caching Output of Compilation Distributed Computing Distributed Parallelization Distributed Computing Distributed Computing Task Compile Source Compile Source Run Binary File Run Binary File Compile Source & Run Binary File Goal High Performance High Performance High Throughput High Throughput Hybrid Computing Pros. -Performance Acceleration (e.g. DB, web-service) -Reduce Build- Time (e.g. Android, Linux) -Utilize Extra Resource Management -Support CPU & GPU - Multicore-Aware Object-Based Unit - Retry Mechanism - Shared Scheduling Cons. -Need Sufficient Physical Memory -Need additional H/W -No Distributed Compiling -Only Use Idle Time -Depend on Network Infrastructure Cost High High Low Low Low User Platform Builders Platform Builders Scientific Research Scientific Research Platform Builders Scientific Research
  21. 21. 21/22 Conclusion • Idle computer resources connected by a network are more ubiquitous than ever before. (e.g. cloud environment, BYOD environment, and generalization of computer usage) • DistCom (DIStributed COMpilation system) support high- speed software compilation. – 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch. – Hybrid Approach For Mobile platform builders, cloud developers, Grid researchers, computational physics, and Statistics. • The drastic improvement of compilation speeds using existing idle PC resources.
  22. 22. 22/22 Thank you for your attention. Any questions?
  23. 23. 23/22 1. Who cares about Distcc/HTCondor based system? Can you do it for mobile devices? 2. Sounds too good. Are there any limitations? 3. Are you going to release it? Or is it a one of talk? 4. I totally don’t get why you are doing this? FAQ
  24. 24. 24/22 1. This approach is distributed PC based software solution. But, some of the small companies do not have sufficient distributed computer resources. 2. Users needs to run local area network to get the ideal network speed. 3. Can you always uses idle PCs in real environment? We focus on the research of public computer facilities, which have a high percentage of idle time. Limitation

Notas do Editor

  • 논문발표시간은 30분입니다. (토의시간 10분 포함)

    *Before: Distributed Compilation System Using HTCondor and Distcc for High-Speed Software Compilation
    *After: Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation
    Distributed Compilation Using HTCondor and Distcc for Accelerating Software Compilation

    Developer, Testing, Release and CI Automation miniconf @ linux.conf.au 2014
    Organiser: Stewart Smith <stewart@flamingspork.com>
    Blog: https://www.flamingspork.com/blog/

  • Thread scheduling framework
  • Source - http://condor.skku.edu/Benchmarks/idletimes.html
  • In unity there is strength.
    (친구) 뭉치면 산다. 함께 모이면 힘이 생긴다.
  • The checkpoint/restart is the ability to save the state of a compiling source code so that compilation can later resume on the same or a different distributed PC from the moment at which its checkpoint was carried out.
  • Q: Queue
    C: Calculation
    U: User
  • Q: Queue
    C: Calculation
    U: User
  • http://distcc.googlecode.com/svn/trunk/doc/web/benchmark.html
  • Theoretically the performance of 8 machines would be similar to the 8-Core PC
  • * Sources: http://opensource.samsung.com
    * Real-Time for Linux native applications
    Default Thread model : NPTL
    Priority Queueing for Mutex/Semaphore
    Priority Inheritance Mutex
    Robust Mutex
  • 실험 데이터 추가
    dedicated resource scheduling vs. enhanced dedicated resource scheduling
    multicore-aware work allocation vs. the number of pc work allocation
    (the minimum number of the object files is equal to the number of available CPUs )

    2. checkpoint-restart experiment
    - object file unit based checkpoint-restart vs. mission-based method

    3. User-aware CPU resource scheduling of sharing method.
    by probing user-access with time-sharing(lowest priority) and real-time(real-time priority)
    for dedicated resource scheduling  real-time(real-time priority
    for shared resource scheduling  time-sharing(lowest priority)

    Future work
    - a network-aware task scheduling technique considering the physical network speed [27] to distribute tasks
    - a task migration algorithm [11] to migrate distributed tasks to another idle PC resource.
  • Here, I prepared FAQ for the audience.
  • Here is limitation of our proposed system.