SlideShare a Scribd company logo
1 of 144
Download to read offline
thanachart@imcinstitute.com1
Thailand Big Data Challenge
#2/2016
Apache Spark in Actions
18-19 June 2016
Dr.Thanachart Numnonda
IMC Institute
thanachart@imcinstitute.com
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Outline
●
Launch Azure Instance
●
Install Docker on Ubuntu
●
Pull Cloudera QuickStart to the docker
●
HDFS
●
Spark
●
Spark SQL
●
Spark Streaming
Hive.apache.org
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Cloudera VM
This lab will use a EC2 virtual server on AWS to install
Cloudera, However, you can also use Cloudera QuickStart VM
which can be downloaded from:
http://www.cloudera.com/content/www/en-us/downloads.html
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hadoop Ecosystem
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Streaming
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hands-On: Launch a virtual server
on Microsoft Azure
(Note: You can skip this session if you use your own
computer or another cloud service)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Sign up for Visual Studio Dev Essential to get free
Azure credit.
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Sign in to Azure Portal
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Virtual Server
This lab will use an Azure virtual server to install a
Cloudera Quickstart Docker using the following
features:
Ubuntu Server 14.04 LTS
DS3_V2 Standard 4 Core, 14 GB memory,28
GB SSD
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Select New => Virtual Machines => Virtual Machines
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
On the Basics page, enter:
●
a name for the VM
●
a username for the Admin User
●
the Authentication Type set to password
●
a password
●
a resource group name
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Choose DS3_v2 Standard
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Setting the inbound port for Hue (8888)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Get the IP address
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Connect to an instance from Mac/Linux
ssh -i ~/.ssh/id_rsa imcinstitute@104.210.146.182
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Connect to an instance from Windows using Putty
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hands-On: Installing Cloudera
Quickstart on Docker Container
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Installation Steps
●
Update OS
●
Install Docker
●
Pull Cloudera Quickstart
●
Run Cloudera Quickstart
●
Run Cloudera Manager
Hive.apache.org
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Update OS (Ubuntu)
●
Command: sudo apt-get update
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Docker Installation
●
Command: sudo apt-get install docker.io
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Pull Cloudera Quickstart
●
Command: sudo docker pull cloudera/quickstart:latest
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Show docker images
●
Command: sudo docker images
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Run Cloudera quickstart
●
Command: sudo docker run
--hostname=quickstart.cloudera --privileged=true -t -i
[OPTIONS] [IMAGE] /usr/bin/docker-quickstart
Example: sudo docker run
--hostname=quickstart.cloudera --privileged=true -t -i -p
8888:8888 cloudera/quickstart /usr/bin/docker-quickstart
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Docker commands:
●
docker images
●
docker ps
●
docker attach id
●
docker kill id
●
Exit from container
●
exit (exit & kill the running image)
●
Ctrl-P, Ctrl-Q (exit without killing the running image)
Hive.apache.org
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Login to Hue
http://104.210.146.182:8888
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hands-On: Importing/Exporting
Data to HDFS
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
HDFS
●
Default storage for the Hadoop cluster
●
Data is distributed and replicated over multiple machines
●
Designed to handle very large files with straming data
access patterns.
●
NameNode/DataNode
●
Master/slave architecture (1 master 'n' slaves)
●
Designed for large files (64 MB default, but configurable)
across all the nodes
Hive.apache.org
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
HDFS Architecture
Source Hadoop: Shashwat Shriparv
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Data Replication in HDFS
Source Hadoop: Shashwat Shriparv
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
How does HDFS work?
Source Introduction to Apache Hadoop-Pig: PrashantKommireddi
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
How does HDFS work?
Source Introduction to Apache Hadoop-Pig: PrashantKommireddi
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
How does HDFS work?
Source Introduction to Apache Hadoop-Pig: PrashantKommireddi
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
How does HDFS work?
Source Introduction to Apache Hadoop-Pig: PrashantKommireddi
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
How does HDFS work?
Source Introduction to Apache Hadoop-Pig: PrashantKommireddi
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Review file in Hadoop HDFS using
File Browse
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Create a new directory name as: input & output
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Upload a local file to HDFS
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hands-On: Connect to a master node
via SSH
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
SSH Login to a master node
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hadoop syntax for HDFS
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Install wget
●
Command: yum install wget
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Download an example text file
Make your own durectory at a master node to avoid mixing with others
$mkdir guest1
$cd guest1
$wget https://s3.amazonaws.com/imcbucket/input/pg2600.txt
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Upload Data to Hadoop
$hadoop fs -ls /user/cloudera/input
$hadoop fs -rm /user/cloudera/input/*
$hadoop fs -put pg2600.txt /user/cloudera/input/
$hadoop fs -ls /user/cloudera/input
Note: you login as ubuntu, so you need to a sudo command to
Switch user to hdfs
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Lecture
Understanding Spark
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Introduction
A fast and general engine for large scale data processing
An open source big data processing framework built around
speed, ease of use, and sophisticated analytics. Spark
enables applications in Hadoop clusters to run up to 100
times faster in memory and 10 times faster even when running
on disk.
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What is Spark?
Framework for distributed processing.
In-memory, fault tolerant data structures
Flexible APIs in Scala, Java, Python, SQL, R
Open source
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Why Spark?
Handle Petabytes of data
Significant faster than MapReduce
Simple and intutive APIs
General framework
– Runs anywhere
– Handles (most) any I/O
– Interoperable libraries for specific use-cases
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark: History
Founded by AMPlab, UC Berkeley
Created by Matei Zaharia (PhD Thesis)
Maintained by Apache Software Foundation
Commercial support by Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Platform
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
thanachart@imcinstitute.com64
Spark Platform
Source: MapR Academy
thanachart@imcinstitute.com65
Source: MapR Academy
thanachart@imcinstitute.com66
Source: TRAINING Intro to Apache Spark - Brian Clapper
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What is a RDD?
Resilient: if the data in memory (or on a node) is lost, it
can be recreated.
Distributed: data is chucked into partitions and stored in
memory acress the custer.
Dataset: initial data can come from a table or be created
programmatically
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
RDD:
Fault tollerant
Immutable
Three methods for creating RDD:
– Parallelizing an existing correction
– Referencing a dataset
– Transformation from an existing RDD
Types of files supported:
– Text files
– SequenceFiles
– Hadoop InputFormat
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
RDD Creation
hdfsData = sc.textFile("hdfs://data.txt”)
Source: Pspark: A brain-friendly introduction
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
RDD: Operations
Transformations: transformations are lazy (not computed
immediately)
Actions: the transformed RDD gets recomputed
when an action is run on it (default)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Direct Acyclic Graph (DAG)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Source: Spark Fundamentals I Big Data Usibersity
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Source: Spark Fundamentals I Big Data Usibersity
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Source: Spark Fundamentals I Big Data Usibersity
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Source: Spark Fundamentals I Big Data Usibersity
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Source: Spark Fundamentals I Big Data Usibersity
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
What happens when an action is executed
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark:Transformation
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark:Transformation
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Single RDD Transformation
Source: Jspark: A brain-friendly introduction
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Multiple RDD Transformation
Source: Jspark: A brain-friendly introduction
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Pair RDD Transformation
Source: Jspark: A brain-friendly introduction
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark:Actions
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark:Actions
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark: Persistence
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Accumulators
Similar to a MapReduce “Counter”
A global variable to track metrics about your Spark
program for debugging.
Reasoning: Excutors do not communicate with each
other.
Sent back to driver
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Broadcast Variables
Similar to a MapReduce “Distributed Cache”
Sends read-only values to worker nodes.
Great for lookup tables, dictionaries, etc.
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Hands-On: Spark Programming
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Functional tools in Python
map
filter
reduce
lambda
IterTools
• Chain, flatmap
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Map
>>> a= [1,2,3]
>>> def add1(x) : return x+1
>>> map(add1, a)
Result: [2,3,4]
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Filter
>>> a= [1,2,3,4]
>>> def isOdd(x) : return x%2==1
>>> filter(isOdd, a)
Result: [1,3]
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Reduce
>>> a= [1,2,3,4]
>>> def add(x,y) : return x+y
>>> reduce(add, a)
Result: 10
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
lambda
>>> (lambda x: x + 1)(3)
Result: 4
>>> map((lambda x: x + 1), [1,2,3])
Result: [2,3,4]
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Exercises
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Start Spark-shell
$spark-shell
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Testing SparkContext
Spark-context
scala> sc
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program in Scala: WordCount
scala> val file =
sc.textFile("hdfs:///user/cloudera/input/pg2600.txt")
scala> val wc = file.flatMap(l => l.split(" ")).map(word =>
(word, 1)).reduceByKey(_ + _)
scala>
wc.saveAsTextFile("hdfs:///user/cloudera/output/wordcountScala
")
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
WordCount output
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program in Python: WordCount
$ pyspark
>>> from operator import add
>>> file =
sc.textFile("hdfs:///user/cloudera/input/pg2600.txt")
>>> wc = file.flatMap(lambda x: x.split(' ')).map(lambda x:
(x, 1)).reduceByKey(add)
>>> wc.saveAsTextFile("hdfs:///user/cloudera/output/
wordcountPython")
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Project: Flight
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Flight Details Data
http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Flight Details Data
http://stat-computing.org/dataexpo/2009/the-data.html
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Data Description
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Snapshot of Dataset
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
FiveThirtyEight
http://projects.fivethirtyeight.com/flights/
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Upload Flight Delay Data
$ wget
https://s3.amazonaws.com/imcbucket/data/flights/2008.csv
$ hadoop fs -put 2008.csv /user/cloudera/input
Upload a data to HDFS
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Navigating Flight Delay Data
>>> airline =
sc.textFile("hdfs:///user/cloudera/input/2008.csv")
>>> airline.take(2)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Preparing Data
>>> header_line = airline.first()
>>> header_list = header_line.split(',')
>>> airline_no_header = airline.filter(lambda row: row !=
header_line)
>>> airline_no_header.first()
>>> def make_row(row):
... row_list = row.split(',')
... d = dict(zip(header_list,row_list))
... return d
...
>>> airline_rows = airline_no_header.map(make_row)
>>> airline_rows.take(5)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Define convert_float function
>>> def convert_float(value):
... try:
... x = float(value)
... return x
... except ValueError:
... return 0
...
>>>
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Finding best/worst airline
>>> carrier_rdd = airline_rows.map(lambda row:
(row['UniqueCarrier'],convert_float(row['ArrDelay'])))
>>> carrier_rdd.take(2)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Program : Finding best/worst airlines
>>> mean_delays_dest =
carrier_rdd.groupByKey().mapValues(lambda delays:
sum(delays.data)/len(delays.data))
>>> mean_delays_dest.sortBy(lambda t:t[1],
ascending=True).take(10)
>>> mean_delays_dest.sortBy(lambda t:t[1],
ascending=False).take(10)
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark SQL
thanachart@imcinstitute.com119
A distributed collection of rows organied into named
columns.
An abstraction for selecting, filtering, aggregating, and
plotting structured data.
Previously => SchemaRDD
DataFrame
thanachart@imcinstitute.com120
Creating and running Spark program faster
– Write less code
– Read less data
– Let the optimizer do the hard work
SparkSQL
thanachart@imcinstitute.com121
Source: Jump start into Apache Spark and Databricks
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
SparkSQL
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Preparing Large Dataset
http://grouplens.org/datasets/movielens/
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
MovieLen Dataset
1)Type command > wget
http://files.grouplens.org/datasets/movielens/ml-100k.zip
2)Type command > yum install unzip
3)Type command > unzip ml-100k.zip
4)Type command > more ml-100k/u.user
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Moving dataset to HDFS
1)Type command > cd ml-100k
2)Type command > hadoop fs -mkdir /user/cloudera/movielens
3)Type command > hadoop fs -put u.user /user/cloudera/movielens
4)Type command > hadoop fs -ls /user/cloudera/movielens
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
SQL Spark MovieLens
$ pyspark --packages com.databricks:spark-csv_2.10:1.2.0
>>> df =
sqlContext.read.format('com.databricks.spark.csv').options(hea
der='true').load('hdfs:///user/cloudera/u.user')
>>> df.registerTempTable('user')
>>> sqlContext.sql("SELECT * FROM user").collect()
Upload a data to HDFS then
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Spark Streaming
thanachart@imcinstitute.com129
Stream Process Architecture
Source: MapR Academy
thanachart@imcinstitute.com130
Spark Streaming Architecture
Source: MapR Academy
thanachart@imcinstitute.com131
Processing Spark DStreams
Source: MapR Academy
thanachart@imcinstitute.com132
Use Case: Time Series Data
Source: MapR Academy
thanachart@imcinstitute.com133
Use Case
Source: http://www.insightdataengineering.com/
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Start Spark-shell with extra memory
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
WordCount using Spark Streaming
$ scala> :paste
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
import StorageLevel._
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sc, Seconds(2))
val lines = ssc.socketTextStream("localhost",8585,MEMORY_ONLY)
val wordsFlatMap = lines.flatMap(_.split(" "))
val wordsMap = wordsFlatMap.map( w => (w,1))
val wordCount = wordsMap.reduceByKey( (a,b) => (a+b))
wordCount.print
ssc.start
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Running the netcat server on another window
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
thanachart@imcinstitute.com138
Hadoop + Spark
Source: MapR Academy
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Challenge: Dataset
thanachart@imcinstitute.com140
NYC’s Taxi Trip Data
http://www.andresmh.com/nyctaxitrips/
thanachart@imcinstitute.com141
NYC Taxi : A Day in Life
http://nyctaxi.herokuapp.com/
thanachart@imcinstitute.com142
Recommended Books
thanachart@imcinstitute.com143
Thanachart Numnonda, thanachart@imcinstitute.com June 2016Apache Spark in Action
Thank you
www.imcinstitute.com
www.facebook.com/imcinstitute

More Related Content

What's hot

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveIMC Institute
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartIMC Institute
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillVince Gonzalez
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summitSujee Maniyam
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopThibault Debatty
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsVíctor Zabalza
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop IntegrationJeremy Hanna
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskVíctor Zabalza
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemDaniel Rodriguez
 
Cassandra synergy
Cassandra synergyCassandra synergy
Cassandra synergyniallmilton
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 

What's hot (20)

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
 
Querying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and DrillQuerying Network Packet Captures with Spark and Drill
Querying Network Packet Captures with Spark and Drill
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
 
Parallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with HadoopParallel SPAM Clustering with Hadoop
Parallel SPAM Clustering with Hadoop
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Lens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgetsLens: Data exploration with Dask and Jupyter widgets
Lens: Data exploration with Dask and Jupyter widgets
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
RHadoop
RHadoopRHadoop
RHadoop
 
Cassandra synergy
Cassandra synergyCassandra synergy
Cassandra synergy
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 

Viewers also liked

บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-MagazineIMC Institute
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015IMC Institute
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibIMC Institute
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in ChinaIMC Institute
 
Blockchain คืออะไร
Blockchain คืออะไรBlockchain คืออะไร
Blockchain คืออะไรIMC Institute
 
Big data Competitions by Komes Chandavimol
Big data Competitions by Komes ChandavimolBig data Competitions by Komes Chandavimol
Big data Competitions by Komes ChandavimolIMC Institute
 
บทความสรุปงานสัมมนา IT Trends 2017 ของ IMC Institute
บทความสรุปงานสัมมนา IT Trends 2017 ของ  IMC Instituteบทความสรุปงานสัมมนา IT Trends 2017 ของ  IMC Institute
บทความสรุปงานสัมมนา IT Trends 2017 ของ IMC InstituteIMC Institute
 
Big data project management
Big data project managementBig data project management
Big data project managementIMC Institute
 
Big Data as a Service
Big Data as a ServiceBig Data as a Service
Big Data as a ServiceIMC Institute
 
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
เทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษาเทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษา
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษาIMC Institute
 
IT Trends 2017: Thailand
IT Trends 2017: ThailandIT Trends 2017: Thailand
IT Trends 2017: ThailandIMC Institute
 
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลCloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลIMC Institute
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public CloudIMC Institute
 
Big data using Public Cloud
Big data using Public CloudBig data using Public Cloud
Big data using Public CloudIMC Institute
 
Technology Trends : Impacts to Thai Industries
Technology Trends : Impacts to Thai IndustriesTechnology Trends : Impacts to Thai Industries
Technology Trends : Impacts to Thai IndustriesSoftware Park Thailand
 
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016IMC Institute
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2IMC Institute
 

Viewers also liked (20)

บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
 
Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015Thai Software & Software Market Survey 2015
Thai Software & Software Market Survey 2015
 
Machine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlibMachine Learning using Apache Spark MLlib
Machine Learning using Apache Spark MLlib
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Mobile User and App Analytics in China
Mobile User and App Analytics in ChinaMobile User and App Analytics in China
Mobile User and App Analytics in China
 
Blockchain คืออะไร
Blockchain คืออะไรBlockchain คืออะไร
Blockchain คืออะไร
 
Big data Competitions by Komes Chandavimol
Big data Competitions by Komes ChandavimolBig data Competitions by Komes Chandavimol
Big data Competitions by Komes Chandavimol
 
บทความสรุปงานสัมมนา IT Trends 2017 ของ IMC Institute
บทความสรุปงานสัมมนา IT Trends 2017 ของ  IMC Instituteบทความสรุปงานสัมมนา IT Trends 2017 ของ  IMC Institute
บทความสรุปงานสัมมนา IT Trends 2017 ของ IMC Institute
 
ITSS Overview
ITSS OverviewITSS Overview
ITSS Overview
 
Big data project management
Big data project managementBig data project management
Big data project management
 
Big Data as a Service
Big Data as a ServiceBig Data as a Service
Big Data as a Service
 
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
เทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษาเทคโนโลยี  Cloud Computing  สำหรับงานสถาบันการศึกษา
เทคโนโลยี Cloud Computing สำหรับงานสถาบันการศึกษา
 
IT Trends 2017: Thailand
IT Trends 2017: ThailandIT Trends 2017: Thailand
IT Trends 2017: Thailand
 
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัลCloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
Cloud Computing สำหรับ ผู้บริหารเพื่อรองรับเศรษฐกิจดิจิทัล
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data on Public Cloud
Big Data on Public CloudBig Data on Public Cloud
Big Data on Public Cloud
 
Big data using Public Cloud
Big data using Public CloudBig data using Public Cloud
Big data using Public Cloud
 
Technology Trends : Impacts to Thai Industries
Technology Trends : Impacts to Thai IndustriesTechnology Trends : Impacts to Thai Industries
Technology Trends : Impacts to Thai Industries
 
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
Cloud Computing in Thailand Readiness Survey 2015 & IT Trends Prediction 2016
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2
 

Similar to Apache Spark in Action

Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...confluent
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]Shweta Patnaik
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...HostedbyConfluent
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Knoldus Inc.
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Bertrand Delacretaz
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Spring Cloud Netflixを使おう #jsug
Spring Cloud Netflixを使おう #jsugSpring Cloud Netflixを使おう #jsug
Spring Cloud Netflixを使おう #jsugToshiaki Maki
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in PythonRalf Gommers
 

Similar to Apache Spark in Action (20)

Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
 
Apache spark installation [autosaved]
Apache spark installation [autosaved]Apache spark installation [autosaved]
Apache spark installation [autosaved]
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Module01
 Module01 Module01
Module01
 
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...
kash.py - How to Make Your Data Scientists Love Real-time with Ralph M. Debus...
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0Introduction to Apache Spark 2.0
Introduction to Apache Spark 2.0
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Pyspark tutorial
Pyspark tutorialPyspark tutorial
Pyspark tutorial
 
Pyspark tutorial
Pyspark tutorialPyspark tutorial
Pyspark tutorial
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spring Cloud Netflixを使おう #jsug
Spring Cloud Netflixを使おう #jsugSpring Cloud Netflixを使おう #jsug
Spring Cloud Netflixを使おう #jsug
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
 

More from IMC Institute

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14IMC Institute
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019IMC Institute
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AIIMC Institute
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12IMC Institute
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital TransformationIMC Institute
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIMC Institute
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมIMC Institute
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon ValleyIMC Institute
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10IMC Institute
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationIMC Institute
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)IMC Institute
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง IMC Institute
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9 IMC Institute
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016IMC Institute
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger IMC Institute
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.orgIMC Institute
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgIMC Institute
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital TransformationIMC Institute
 

More from IMC Institute (20)

นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14นิตยสาร Digital Trends ฉบับที่ 14
นิตยสาร Digital Trends ฉบับที่ 14
 
Digital trends Vol 4 No. 13 Sep-Dec 2019
Digital trends Vol 4 No. 13  Sep-Dec 2019Digital trends Vol 4 No. 13  Sep-Dec 2019
Digital trends Vol 4 No. 13 Sep-Dec 2019
 
บทความ The evolution of AI
บทความ The evolution of AIบทความ The evolution of AI
บทความ The evolution of AI
 
IT Trends eMagazine Vol 4. No.12
IT Trends eMagazine  Vol 4. No.12IT Trends eMagazine  Vol 4. No.12
IT Trends eMagazine Vol 4. No.12
 
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformationเพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
เพราะเหตุใด Digitization ไม่ตอบโจทย์ Digital Transformation
 
IT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to WorkIT Trends 2019: Putting Digital Transformation to Work
IT Trends 2019: Putting Digital Transformation to Work
 
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรมมูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
มูลค่าตลาดดิจิทัลไทย 3 อุตสาหกรรม
 
IT Trends eMagazine Vol 4. No.11
IT Trends eMagazine  Vol 4. No.11IT Trends eMagazine  Vol 4. No.11
IT Trends eMagazine Vol 4. No.11
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
บทความ The New Silicon Valley
บทความ The New Silicon Valleyบทความ The New Silicon Valley
บทความ The New Silicon Valley
 
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10นิตยสาร IT Trends ของ  IMC Institute  ฉบับที่ 10
นิตยสาร IT Trends ของ IMC Institute ฉบับที่ 10
 
แนวทางการทำ Digital transformation
แนวทางการทำ Digital transformationแนวทางการทำ Digital transformation
แนวทางการทำ Digital transformation
 
The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)The Power of Big Data for a new economy (Sample)
The Power of Big Data for a new economy (Sample)
 
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
บทความ Robotics แนวโน้มใหม่สู่บริการเฉพาะทาง
 
IT Trends eMagazine Vol 3. No.9
IT Trends eMagazine  Vol 3. No.9 IT Trends eMagazine  Vol 3. No.9
IT Trends eMagazine Vol 3. No.9
 
Thailand software & software market survey 2016
Thailand software & software market survey 2016Thailand software & software market survey 2016
Thailand software & software market survey 2016
 
Developing Business Blockchain Applications on Hyperledger
Developing Business  Blockchain Applications on Hyperledger Developing Business  Blockchain Applications on Hyperledger
Developing Business Blockchain Applications on Hyperledger
 
Digital transformation @thanachart.org
Digital transformation @thanachart.orgDigital transformation @thanachart.org
Digital transformation @thanachart.org
 
บทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.orgบทความ Big Data จากบล็อก thanachart.org
บทความ Big Data จากบล็อก thanachart.org
 
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformationกลยุทธ์ 5 ด้านกับการทำ Digital Transformation
กลยุทธ์ 5 ด้านกับการทำ Digital Transformation
 

Recently uploaded

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Apache Spark in Action