4. Design space of shared memory computers
Shared memory
computers
Interconnection Cache coherency
Single address space scheme
memory access
Hardware
Virtual shared Shared path Software based
Physical shared based
memory Switching
memory UMA network
Singled bus
based
Multiple bus Crossbar Multistage network
NUMA based
Bus
multiplication
CC-NUMA
Grid of buses Omega Banyan Benes
Hierarchical
COMA
system
6. *
*Also called distributed shared memory architecture
*The local memories of multi-computer are
components of global address space:
* any processor can access the local memory of any other
processor
*Three approaches:
* Non-uniform memory access (NUMA) machines
* Cache-only memory access (COMA) machines
* Cache-coherent non-uniform memory access
(CC-NUMA) machines
8. *
*Logically shared memory is physically
distributed
*Different access of local and remote memory
blocks. Remote access takes much more time
– latency
*Sensitive to data and program distribution
*Close to distributed memory systems, yet the
programming paradigm is different
*Example: Cray T3D
11. *
*Each block of the shared memory works as
local cache of a processor
*Continuous, dynamic migration of data
*Hit-rate decreases the traffic on the
Interconnection Network
*Solutions for data-consistency increase the
same traffic (see cache coherency problem
later)
*Examples: KSR-1, DDM
13. *
*A combination of NUMA and COMA
*Initially static data distribution, then
dynamic data migration
*Cache coherency problem is to be solved
*COMA and CC-NUMA are used in newer
generation of parallel computers
*Examples: Convex SPP1000, Stanford DASH,
MIT Alewife
14. *
+ No need to partition data or program, uniprocessor
programming techniques can be adapted
+ Communication between processor is efficient
+ Minor modifications of tool chain and compiler
- Synchronized access to share data in memory needed.
Synchronizing constructs (semaphores, conditional critical
regions, monitors) result in nondeterministic behavior which
can lead programming errors that are difficult to discover
- Lack of scalability due to (memory) contention problem
15. *
*Memory Access Time
* can be a bottleneck even in a single-processor system
*Contention for Memory
* two or more processors want to access a location in the same
block at the same time (hot spot problem).
*Contention for Communication
* processors should share and use exclusively elements of the
Interconnection Network
*Result: long latency-time, idle processors,
nonscalable system
16. *
*Problems of scalable computers
1. Tolerate and hide latency of remote loads
2. Tolerate and hide idling due to synchronization
*Solutions
1. Cache memory
* problem of cache coherence
2. Prefetching
3. Threads and fast context switching