Decision Forest machine learning algorithm is adopted to find out the features to affect the temperature of fueling valve and controller and to predict it.
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Big Data Analysis in Hydrogen Station using Spark and Azure ML
1. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant Data
Analysis and Prediction Using
Spark
Manvi Chandra, mchandr2@calstatela.edu
Jongwook Woo, PhD, jwoo5@calstatela.edu
High-Performance Information Computing Center (HiPIC)
California State University Los Angeles
2. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Machine Learning
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
3. High Performance Information Computing Center
Jongwook Woo
CSULA
Myself
Name: Manvi chandra
Experience:
2012 -2014
– Programmer Analyst at Cognizant Technology Solutions
2015-2016 - Present : Master’s in information system
Exposed to Big Data Analytics
Pursuing research in Big data analytics and machine learning
2007-2011-Bachelor of Technology in Electronics and
Communication Engineering.
4. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Machine Learning
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
6. High Performance Information Computing Center
Jongwook Woo
CSULA
Data Issues
Large-Scale data
Tera-Byte (1012), Peta-byte (1015)
– Because of web
– Sensor Data (IoT), Bioinformatics, Social
Computing, Streaming data, smart phone, online
game…
Cannot handle with the legacy approach
Too big
Un-/Semi-structured data
Too expensive
Need new systems
Non-expensive
7. High Performance Information Computing Center
Jongwook Woo
CSULA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– On non-expensive commodity computers
How to compute Big Data
– MapReduce
– Parallel Computing with multiple non-expensive
computers
• Own super computers
8. High Performance Information Computing Center
Jongwook Woo
CSULA
What is Hadoop?
8
Hadoop Founder:
Doug Cutting
Chief Architect at Cloudera
9. High Performance Information Computing Center
Jongwook Woo
CSULA
Definition: Big Data
Inexpensive frameworks that can
store a large scale data and process
it faster in parallel
Hadoop
–Non-expensive Super Computer
–You can build and run your applications
10. High Performance Information Computing Center
Jongwook Woo
CSULA
Alternate of Hadoop MapReduce
Limitation in MapReduce
Hard to program in Java
Batch Processing
– Not interactive
Disk storage for intermediate data
– Performance issue
Spark by UC Berkley AMP Lab
In-memory storage for intermediate data
10 ~ 100x faster than N/W and Disk
11. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Machine Learning
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
12. High Performance Information Computing Center
Jongwook Woo
CSULA
Machine Learning
Subfield of computer science that evolved from
the study of pattern recognition and
computational learning theory in artificial
intelligence.
Explores pattern recognition during data analysis
through computer science and statistics.
Machine learning is a method of data analysis
that automates analytical model building. Using
algorithms that iteratively learn from data,
machine learning allows computers to find
hidden insights without being explicitly
programmed where to look.
13. High Performance Information Computing Center
Jongwook Woo
CSULA
Machine Learning Studio
Microsoft Azure Machine Learning Studio is a
collaborative, drag-and-drop tool you can use
to build, test, and deploy predictive analytics
solutions on your data.
14. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Machine Learning
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
15. High Performance Information Computing Center
Jongwook Woo
CSULA
Spark
In-Memory Data Computing
Faster than Hadoop MapReduce
Can integrate with Hadoop and its ecosystems
HDFS
HBase, Hive, Sequence files
New Programming with faster data sharing
Good in complex multi-stage applications
– Iterative graph algorithms, Machine Learning
Interactive query
16. High Performance Information Computing Center
Jongwook Woo
CSULA
Spark
RDDs, Transformations, and Actions
Spark
Streaming
real-time
Spark
SQL
MLLib
machine
learning
DStream’s:
Streams of RDD’s
SchemaRDD’s
DataFrames
RDD-Based
Matrices
Spark Cores
GraphX
(graph)
RDD-Based
Matrices
Spark
R
RDD-Based
Matrices
17. High Performance Information Computing Center
Jongwook Woo
CSULA
Spark Drivers and Workers
Drivers
Client
–with SparkContext
• Create RDDs
Workers
Spark Executor
Run on cluster nodes
–Production
Run in local threads
–development
18. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Hive Examples
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
19. High Performance Information Computing Center
Jongwook Woo
CSULA
RDD
Resilient Distributed Dataset (RDD)
Distributed collections of objects
–that can be cached in memory
RDD, DStream, SchemaRDD, PairRDD
Immutable
Lineage
–History of the objects
–Automatically and efficiently recompute lost
data
20. High Performance Information Computing Center
Jongwook Woo
CSULA
RDD Operations
Transformation
Define new RDDs from the current
–Lazy: not computed immediately
map(), filter(), join()
Actions
Return values
count(), collect(), take(), save()
21. High Performance Information Computing Center
Jongwook Woo
CSULA
Programming in Spark
Scala
Functional Programming
–Fundamental of programming is function
• Input/Output is function
No side effects
–No states
Python
Legacy, large Libraries
Java
22. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Hive Examples
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
23. High Performance Information Computing Center
Jongwook Woo
CSULA
Spark
SparkSQL
Turning an RDD into a Relation
Querying using SQL
Spark Streaming
DStream
– RDD in streaming
– Windows
• To select DStream from streaming data
MLib
Sparse vector support, Decision trees, Linear/Logistic
Regression, PCA
SVD and PCA
24. High Performance Information Computing Center
Jongwook Woo
CSULA
Spark
Hydrogen gas power plant spark
model
o Separating the labeled column.
o Creation of RDD.
o Splitting the data into training and test sets.
o Training the dataset using Decision forest
regression algorithm.
o Evaluation of the result.
26. High Performance Information Computing Center
Jongwook Woo
CSULA
Contents
Myself
Introduction To Big Data
Hive Examples
Spark Cores
RDD
Spark SQL, Streaming, ML
Hydrogen Gas Power Plant Prediction
Model
27. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
The Cal State L.A. Hydrogen Research
and Fueling Facility (H2 Station) was
formally opened on May 7, 2014.
28. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
The station is capable of producing hydrogen
onsite from renewable energy sources, using the
process known as electrolysis.
Cal State L.A. Hydrogen Research and Fueling
Facility became the first station in the nation
to sell hydrogen fuel by the kilogram to the
public.
29. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
Workflow
30. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
Model
31. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
Results and observations
32. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
Results and observations
According to our research we are able to predict
Vehicle Pressure (Pressure of hydrogen gas within the
vehicle Hydrogen Storage System)using our model.
The algorithm used is decision forest regression.
Decision forest are an ensemble learning method for
classification, regression and other tasks, that operate
by constructing a multitude of decision trees at
training time and outputting the class that is
the mode of the classes (classification) or mean
prediction (regression) of the individual trees.
33. High Performance Information Computing Center
Jongwook Woo
CSULA
Hydrogen Gas Power Plant
Prediction Model
Results and observations
STATE OF CHARGE (SOC):-
– Ratio of hydrogen density within the vehicle
storage system to the full-fill density. SOC is
expressed as a percentage and is computed
based on the gas density as per formula below:
Our model predict vehicle pressure which in
turn could be used to determine the state of
charge.
35. High Performance Information Computing Center
Jongwook Woo
CSULA
References
Hadoop, http://hadoop.apache.org
Apache Spark op Word Count Example
(http://spark.apach.org )
Databricks (http://www.databricks.com )