Exploring the Future Potential of AI-Enabled Smartphone Processors
Computer Architecture Seminar
1. Data-Triggered ThreadsEliminating Redundant Computation (HPCA 2011) Hung-Wei Tseng and Dean M. Tullsen Department of Computer Science and Engineering University of California, San Diego Seminar by: Naman Kumar for http://carg.uwaterloo.ca
2. Eliminating Redundant Computation Silent Store: Amemory store operation that does not change the contents at that location 20-68% of all stores are silent [Lepakand Lipasti] How about eliminating the entire stream of computation surrounding a silent store!
3. Eliminating Redundant Computation Redundant loads: silent stores result in redundant loads (last time this load loaded this address, it fetched the same value) SPEC2000 C: 78% of all loads are redundant 50% of all instructions depend on redundant loads
5. DTT: Implementation The Programming Model Place redundant computation in a separate thread: Thread is restartable Thread may be aborted/restarted multiple times Thread management is through architectural changes. Easy to verify data races as thread life is between time between triggering store and main thread join point.
6. DTT: Implementation The Programming Model Trigger is placed in data section, not code section
9. DTT: Implementation Architectural Support Following tables are all implemented in hardware Thread registry (table) Thread Queue (table) Thread Status Table (table) PC
10. DTT: Implementation Architectural Support ISA modifications tstore – generate thread when mem modified is not silent tspawn – spawn the thread using thread registry treturn– finish execution of the current thread tcancel – terminate a running thread
Notas do Editor
Memoization and other techniques save on memory access. This technique proposes a solution to save on accesses and the computation involving the data from these access.
eg: sum of all nodes in a 100 node linked-list. Each node has to be accessed when say, only 2 have changed. That’s 98 redundant loads.
If value of SP calculated is diff from what is in memory, then a support thread (S) will be spawned to calculate and B. Main thread will skip code section B since data has already been calculated. Instructions for B will be left as is because support thread may have failed to spawn; skipping the thread, code will be executed by the main thread.
Programmers implement this with C pragma constructs
Every time the variable is WRITTEN to, the associated DTThread is executed
If programmer has a reason to suspect that the thread may crash/be aborted, he can place the #cancel pragma. This will ensure that only the main thread executes this block. Support thread will not be registered.
This function is triggered in a new thread (support thread) when control reaches “#block xxx”
Start PC: PC of the skippable code in the main thread.Destination PC denotes the end of the skippable region.Post skip PC is address after the region is skipped.