Are you tired of struggling with your existing data analytic applications?
When MapReduce first emerged it was a great boon to the big data world, but modern big data processing demands have outgrown this framework.
That’s where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark’s general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. This combined with it’s interactive shell make it a powerful tool useful for everybody, from data tinkerers to data scientists to data developers.
4. Concerns
▪ Am I too small?
4
▪ Will switching be too costly?
▪ Can I utilize my current infrastructure?
▪ Will I be able to find developers?
▪ Are there enough resources available?
7. object WordCount{
def main(args: Array[String])){
val conf = new SparkConf()
.setAppName("wordcount")
val sc = new SparkContext(conf)
sc.textFile(args(0))
.flatMap(_.split(" "))
.countByValue
.saveAsTextFile(args(1))
}
}
7
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Tiny CodeBig Code
Why Spark?
22. Concerns
▪ Am I too small?
22
▪ Will switching from MapReduce be too costly?
▪ Can I utilize my current infrastructure?
▪ Will I be able to find developers?
▪ Are there enough resources available?
24. EXPERT SUPPORT
Why Contact Typesafe for Your Apache Spark Project?
Ignite your Spark project with 24/7 production SLA,
unlimited expert support and on-site training:
• Full application lifecycle support for Spark Core,
Spark SQL & Spark Streaming
• Deployment to Standalone, EC2, Mesos clusters
• Expert support from dedicated Spark team
• Optional 10-day “getting started” services
package
Typesafe is a partner with Databricks, Mesosphere
and IBM.
Learn more about on-site trainingCONTACT US