24. 起動が速い
•
1秒以内で立ち上がる
•
仕組み
•
Spark avoids this problem by using a fast event-driven RPC library
to launch tasks and by reusing its worker processes. It can launch
thousands of tasks per second with only about 5 ms of over- head
per task, making task lengths of 50–100 ms and MapReduce jobs
of 500 ms viable. What surprised us is how much this affected
query performance, even in large (multi-minute) queries[2].
•
意訳: 実装がんばったら,5msec で 1タスク立ち上がるように
なったよ
•
Tuesday, October 22, 13
[2]Shark: SQL and Rich Analytics at Scale
31. Shark とは?
• Spark 上で SQL はじめました
CREATE TABLE logs_last_month_cached AS SELECT * FROM logs
WHERE time > date(...);
SELECT page, count(*) c FROM logs_last_month_cached GROUP
BY page ORDER BY c DESC LIMIT 10;
Shark(SQL)
Spark
HDFS
Tuesday, October 22, 13