Multiprocessors(performance and synchronization issues)

- B Y G A U R A V D A L V I
R O L L N O : 9 1 0 3
M E ( I T )
Multiprocessors
(performance and synchronization
issues.)

Symmetric shared memory architecture
 UMA.
Figure 1:UMA Architecture [1].

 NUMA.
Figure 2: NUMA Architecture [1].

 Distributed Shared Memory Architecture.
Figure 3:Distributed Shared Memory Architecture [2].

Performance Issues.
 Cache performance is depended on :
 Behavior of uniprocessor cache miss traffic.
 Traffic caused by communication.
 Factors affecting the two components of miss rate:
 Changing the CPU count.
 Cache size.
 Block size.

 The misses that arise from interprocessor
communication, which are often
called coherence misses, can be broken into two
separate sources.
 True Sharing Misses.
 False Sharing Misses [3].

Synchronization issues.
 Synchronization mechanisms are typically built with user-
level software routines that rely on hardware supplied
synchronization instructions.
 For smaller multiprocessors or low-contention situations,
instruction sequence capable of atomically retrieving.
 In larger-scale multiprocessors or high-contention
situations, synchronization can become a performance
bottleneck [4].

Types of Synchronization.
 Mutual exclusion.
 Synchronize entry into critical sections.
 Normally done with locks.
 Point-to-point synchronization.
 Tell a set of processors (normally set cardinality is one) that
they can proceed.
 Normally done with flags.
 Global synchronization.
 Bring every processor to sync.
 Wait at a point until everyone is there.
 Normally done with barriers [4].

Basic Hardware Primitives.
 Atomic Exchange.
addi register, r0, 0x1 /* r0 is hardwired to 0 */
Lock: xchg register, lock_addr /* An atomic load and store */
bnez register, Lock
Unlock remains unchanged
 Various processors support this type of instruction
 Intel x86 has xchg , Sun UltraSPARC has ldstub (load-store-
unsigned byte), UltraSPARC also has swap.
 Normally easy to implement for bus-based systems: whoever wins
the bus for xchg can lock the bus.
 Difficult to support in distributed memory systems [4].

 Test and Set
 which tests a value and sets it if the value passes the test.
 For example, we could define an operation that tested for 0
and set the value to 1, which can be used in a fashion similar to
how we used atomic exchange [4].

 Fetch-and-increment.
Algorithm:
<< atomic >> function FetchAndAdd(address location, int inc)
{ int value := *location
*location := value + inc return value }

 To implement a mutual exclusion lock, we define the operation
FetchAndIncrement, which is equivalent to FetchAndAdd with inc=1.
With this operation, a mutual exclusion lock can be implemented using
the ticket lock algorithm as:

 The pair of instructions includes a special load called
a load linked or load locked and a special store called
a store conditional.
 These instructions are used in sequence: If the contents of the
memory location specified by the load linked are changed
before the store conditional to the same address occurs, then
the store conditional fails.
 The store conditional is defined to return 1 if it was successful
and a 0 otherwise [4].


References
1. David E.Ott “Optimizing Software Applications for NUMA ”
Internet:http://www.drdobbs.com/go-
parallel/article/print?articleId=218401502, July 10 2009[Jan
29,2015].
2. Prof. H.P.Oscer “Technical Design Issues” Internet:
http://www.oser.org/~hp/ds/node15.html, June 08 2001 [Jan
29,2015].
3. John L. Hennessy , David A. Patterson. “Multiprocessors and Thread
Level Parallelism” in “Computer Architecture: A Quantitative
Approach”, 4th
edition, Morgan Kaufmann Publishers: San Francisco,
2007, pp. 218-219.
4. Prof. Rajat Moona, Dr. Mainak Chaudhuri, Prof. Sanjeev K.
Aggarwal, “Program Optimization for Multi-core Architectures”
Internet: http://nptel.ac.in/courses/106104025/13, [Jan 29,2015].

Multiprocessors(performance and synchronization issues)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Multiprocessors(performance and synchronization issues)

Semelhante a Multiprocessors(performance and synchronization issues) (20)

Último

Último (20)

Multiprocessors(performance and synchronization issues)

Notas do Editor