SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
Seungchul Lee, sclee@bistel.com
BISTel Inc.
Daeyoung Kim, dykim3@bistel.com
BISTel Inc.
Analyzing 2TB of Raw Trace Data
from a Manufacturing Process:
A First Use Case of Apache Spark for
Semiconductor Wafers from Real Industry
#UnifiedAnalytics #SparkAISummit
Contents
2#UnifiedAnalytics #SparkAISummit
• Introduction to BISTel
– BISTel’s business and solutions
– Big Data for BISTel’s smart manufacturing
• Use cases of Apache Spark in manufacturing industry
– Trace Analyzer (TA)
– Map Analyzer (MA)
Introduction to BISTel
3#UnifiedAnalytics #SparkAISummit
BISTel’s business areas
• Providing analytic solutions based on Artificial Intelligence (AI)
and Big Data to the customers for Smart Factory
4#UnifiedAnalytics #SparkAISummit
BISTel’s solution areas
• World-Class Manufacturing Intelligence through innovation
5#UnifiedAnalytics #SparkAISummit
BISTel’s analytic solution: eDataLyzer
6#UnifiedAnalytics #SparkAISummit
BISTel’s analytic solutions (MA)
7#UnifiedAnalytics #SparkAISummit
• Map Pattern Clustering
– Automatically detect and classify map patterns with/without libraries
– Process thousands of wafers and give results in few minutes
Clustered
Defective
wafers
BISTel’s analytic solutions (TA)
• Specialized Application for Trace Raw Data
– Extracts the vital signs out of equipment trace data
– Provide in-depth analysis which traditional methods cannot reach
8#UnifiedAnalytics #SparkAISummit
Abnormal
Normal
BISTel’s big data experiences
9#UnifiedAnalytics #SparkAISummit
BISTel’s big data experiences
10#UnifiedAnalytics #SparkAISummit
- YMA Test using Spark - - Big data platforms comparison-
Trace Analyzer (TA)
11#UnifiedAnalytics #SparkAISummit
Trace Data
• Trace Data is sensor data collected from processing equipment
within a semiconductor fab during a process run.
12#UnifiedAnalytics #SparkAISummit
- Semiconductor industry -
- Wafer -
Logical Hierarchy of the trace data
13#UnifiedAnalytics #SparkAISummit
Wafer
Lot
Recipe Step
Recipe
Process
Visualization
Whole
process
Process
Recipe 1
wafer
An example of the trace data
14#UnifiedAnalytics #SparkAISummit
Process Recipe Recipe step Lot Wafer Param1 Param2 Time
021_LIT RecipeA 1 1501001 1 32.5 45.4
2015-01-20
09:00:00
Data attributes
• Base unit : one process and one parameters
• 1000 wafers
• Each wafer has 1000~2000 data points in a recipe step
• Some factors that make trace data huge volume
• # of parameters
• # of processes
• # of wafers
• # of recipe steps
• duration of the recipe step
15#UnifiedAnalytics #SparkAISummit
An example of the trace data – (2)
16#UnifiedAnalytics #SparkAISummit
No. Fab
# of
processes
# of
recipe steps
Avg. Recipe
ProcessTime
Data
Frequency
# of
units
Parameter
per unit
(max)
1 Array 109 10 16 mins 1Hz 288 185
2 CF 25 5 1min 1Hz 154 340
3 CELL 12 7 1min 1Hz 213 326
4 MDL 5 12 2mins 1Hz 32 154
• Some calculations
• For one process, one parameter and one wafer
• 16 * 10 * 60 sec * 1Hz = 9600 points
• Multi parameters, multi processes and multi wafers
• 9600 * 288 *185 * 109 * (# of wafers)
Spark : Smart manufacturing
• Spark is a best way to process big data in batch analytics
• Distributing data based on parameter is suitable for using
Apache Spark.
• Easy deployment and scalability when it comes to providing the
solutions to our customers
17#UnifiedAnalytics #SparkAISummit
18#UnifiedAnalytics #SparkAISummit
Naïve way: applying spark to TA
How to apply Spark to TA?
traceDataSet = config.getTraceRDDs().mapToPair(t->{
String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key
return new Tuple2<String,String>(recipeStepKey,t);
}).groupByKey();
traceDataSet.flatMap(t->{
Map<String,TraceDataSet> alltraceData = TAUtil.getTraceDataSet(t);
...
TAUtil.seperateFocusNonFocus(alltraceData,focus,nonFocus); #separate data
ta.runTraceAnalytic(focus,nonFocus,config); # calling the TA core
...
});
Most cases in manufacturing industry
• In real industry, most parameters have small number of data points.
(Most case : 1Hz)
• In addition, the number of wafers to be analyzed is not massive.
(up to 1,000 wafers)
• Therefore the total number of data points in a process can be easily
processed in a core
Issues in manufacturing industry
21#UnifiedAnalytics #SparkAISummit
• Last year, I have got an email indicating that..
Big parameter
22#UnifiedAnalytics #SparkAISummit
• Tools with high frequency or high recipe time can produce huge
volume for single parameter
• Requirements in industry
• For one parameter
• 400,000 wafers
• 20,000 data points.
Limitations of the Naïve TA
23#UnifiedAnalytics #SparkAISummit
For(Tuple<String,Iterable<String> recipeTrace : allTraceData){
TraceDataSet ftds = new TraceDataSet();
Iterable<String> oneRecipe = recipeTrace._2();
for(String tr : oneRecipe){
TraceData td = TAUtil.convertToTraceData(tr);
ftds.add(td);
}
}
traceDataSet = config.getTraceRDDs().mapToPair(t->{
String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key
return new Tuple2<String,String>(recipeStepKey,t);
}).groupByKey();
All the data points based
on the key are pushed
into one core by shuffling
Java object holds too
many data points
Needs for new TA spark
24#UnifiedAnalytics #SparkAISummit
• Naïve TA Spark version cannot process massive data points.
• Nowadays, new technology enhancements enable data capture at
much higher frequencies.
• TA for “big parameter” version is necessary.
Our idea is that..
25#UnifiedAnalytics #SparkAISummit
• Extracting the TA core logic
– Batch mode
– Key-based processing
– Using .collect() to broadcast variables
– Caching the object
• Preprocessing trace data
• Key-based processing
• Base unit : process key or recipe step key
Batch
26#UnifiedAnalytics #SparkAISummit
JavaPairRDD<String, List<String>> traceDataRDD
= TAImpl.generateBatch(traceData)
First element : process, recipe
step, parameter and batch ID
Second element : lot, wafer and
trace values
Summary
statistics
.
.
.
•Param A
Collect() : TA Cleaner
27#UnifiedAnalytics #SparkAISummit
• Filtering out traces that have unusual duration of process time.
• Use the three main Spark APIs
– mapToPair : extract relevant information
– reduceByKey : aggregating values based on the key
– collect : send the data to the driver
Collect() : TA Cleaner – (2)
28#UnifiedAnalytics #SparkAISummit
Worker
wafer value
1 65
2 54
… …
Worker
wafer value
1 83
2 54
… …
Worker
wafer value
1 34
2 77
… …
Worker
wafer value
1 71
2 80
… …
• traceData.mapToPair()
• Return
• key : process
• value : wafer and its length
Collect() : TA Cleaner – (3)
29#UnifiedAnalytics #SparkAISummit
• reduceByKey()
• Aggregating contexts into one based on the process key
wafer value
1 65
2 54
… …
Shuffling
wafer value
1 88
2 92
… …
wafer value
1 153
2 146
… …
Collect() : TA Cleaner – (4)
30#UnifiedAnalytics #SparkAISummit
• Applying filtering method in each worker
mapToPair(t -> {
String pk = t._1();
Double[] values = toArray(t._2());
FilterThresdholds ft = CleanerFilter.filterByLength(values);
return Tuple(pk,ft);
}).collect();
Examples 2 : Computing outlier
31#UnifiedAnalytics #SparkAISummit
• To detect the outlier in a process, median statistics is required.
• To compute the median value, the values need to be sorted.
• Sort(values)
Examples 2 : Computing outlier – (2)
32#UnifiedAnalytics #SparkAISummit
mapToPair reduceByKey
• Computed the approximate median value for big data processing.
• Applied histogram for median
• Collecting the histogram
Collect
Caching the trace data
33#UnifiedAnalytics #SparkAISummit
• Persist the trace data before applying TA algorithm
• Be able to prevent data load when the action is performed
Focus=Focus.persist(StorageLevel.MEMORY() AND DISK())
NonFocus=NonFocus.persist(StorageLevel.MEMORY() AND DISK())
RDD vs. DataSet (DataFrame)
34#UnifiedAnalytics #SparkAISummit
• RDD
– All the data points in a process should be scanned
• Advantage of the DataSet is weakened.
– Hard to manipulate trace data using SQL
– Basic statistics (i.e. Min, Max, Avg, Count…)
– Advanced algorithm (Fast Fourier Transform and
Segmentation)
Demo : Running the TA algorithm
35#UnifiedAnalytics #SparkAISummit
• Analyzed 2TB trace data using TA
TA results in eDataLyzer
36#UnifiedAnalytics #SparkAISummit
Results of the Naïve TA
37#UnifiedAnalytics #SparkAISummit
Results of the big parameter TA Spark
38#UnifiedAnalytics #SparkAISummit
• Two different TA Spark versions
Two different TA Spark versions
39#UnifiedAnalytics #SparkAISummit
Data size
# of
parameter
# of
wafers
# of data
points
Running Time
Naïve TA 2TB 270,000 250 1000 1.1h
Big Param TA 1TB 4 400,000 20,000 54min
Map Analyzer (MA)
40#UnifiedAnalytics #SparkAISummit
Map Analytics (MA)
41#UnifiedAnalytics #SparkAISummit
• Hierarchical clustering is used to find a defect pattern
S.-C. Hsu, C.-F. Chien / Int. J. Production Economics 107 (2007) 88–103
MA datasets
42#UnifiedAnalytics #SparkAISummit
Process Process step Parameter Lot Wafer
Defective
chips
FPP Fall_bin P01 8152767 23
-02,04|-
01,22|+00,25|+08,
33|+04,05
waferDataSetRDD.mapToPair(...).groupBy().mapToPair(...);
Generating
a key value pair
Calling hierarchical
clustering
BISTel’s first approach for MA
43#UnifiedAnalytics #SparkAISummit
• Using the batch mode for clustering massive wafers.
Demo : Running the MA algorithm
44#UnifiedAnalytics #SparkAISummit
• Dataset consists of 26 parameters containing 120,000 wafers
Problems in batch for clustering
45#UnifiedAnalytics #SparkAISummit
• In a manufacturing industry, some issues exist
# of wafers Time Detecting a pattern
DataSet1 15 2017-02-01:09:00 ~ 09:30 Yes
DataSet2 7,000 2017-02-01~2017-02-08 No
Spark summit: SHCA algorithm
46#UnifiedAnalytics #SparkAISummit
• In Spark Summit 2017, chen jin presented a scalable hierarchical
clustering algorithm using Spark.
A SHCA algorithm using Spark
47#UnifiedAnalytics #SparkAISummit
Jin, Chen, et al. "A scalable hierarchical clustering algorithm using
spark." 2015 IEEE First International Conference on Big Data Computing
Service and Applications. IEEE, 2015.
Applying SHCA to wafer datasets
48#UnifiedAnalytics #SparkAISummit
Wafer map ID Coordinates of defective chips
A (13,22), (13,23), (13,24), (13,25)…
B (5,15), (6,12), (6,17), (8,25)…
C (9,29), (16,33), (19,39), (22,25)…
D (19,9), (20,2), (23,21), (25,4)…
E (5,5), (5,8), (5,15), (5,25)…
• Designed the key-value pairs
• Minimum spanning tree (MST)
– Vertex : Wafer
– Edge : distance between wafers
• distance w1, w2
Comparison between two versions
49#UnifiedAnalytics #SparkAISummit
Comparison between two versions - (2)
50#UnifiedAnalytics #SparkAISummit
Spark stage results of MA
51#UnifiedAnalytics #SparkAISummit
• Approximately 100,000 wafers are analyzed for clustering
Comparison of the results
52#UnifiedAnalytics #SparkAISummit
0
500
1000
1500
2000
2500
5,000 50,000 100k 160k 320k
Batch New MA
Summary
53#UnifiedAnalytics #SparkAISummit
• MA using SHCA is accurate than the batch MA.
• However, the running time of the batch MA is faster than that of the
new MA.
• In manufacturing industry, we suggest them to use both of two MAs.
Conclusions
54#UnifiedAnalytics #SparkAISummit
• A first use case of Apache Spark in Semiconductor industry
– Terabytes of trace data is processed
– Achieved hierarchical clustering on distributed machines for
semiconductor wafers
Acknowledgements
55#UnifiedAnalytics #SparkAISummit
• BISTel Korea (BK)
– Andrew An
• BISTel America (BA)
– James Na
– WeiDong Wang
– Rachel Choi
– Taeseok Choi
– Mingyu Lu
* This work was supported by the World Class 300 Project (R&D) (S2641209, "Development of next generation intelligent Smart
manufacturing solution based on AI & Big data to improve manufacturing yield and productivity") of the MOTIE, MSS(Korea).
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Mais conteúdo relacionado

Mais procurados

Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
Data Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data StrategyData Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data StrategyAlan McSweeney
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksSarah Dutkiewicz
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
data management and analysis
 data management and analysis data management and analysis
data management and analysisabdullahi mohamed
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleDatabricks
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Databricks
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 

Mais procurados (20)

Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Data Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data StrategyData Audit Approach To Developing An Enterprise Data Strategy
Data Audit Approach To Developing An Enterprise Data Strategy
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
data management and analysis
 data management and analysis data management and analysis
data management and analysis
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 

Semelhante a Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Databricks
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Datapepeborja
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...DATAVERSITY
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineDatabricks
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Alpine Data
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Landon Robinson
 
All (that i know) about exadata external
All (that i know) about exadata externalAll (that i know) about exadata external
All (that i know) about exadata externalPrasad Chitta
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 

Semelhante a Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry (20)

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Teradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_DataTeradata Partner 2016 Gas_Turbine_Sensor_Data
Teradata Partner 2016 Gas_Turbine_Sensor_Data
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applic...
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL Engine
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
All (that i know) about exadata external
All (that i know) about exadata externalAll (that i know) about exadata external
All (that i know) about exadata external
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 

Último (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 

Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry

  • 1. Seungchul Lee, sclee@bistel.com BISTel Inc. Daeyoung Kim, dykim3@bistel.com BISTel Inc. Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry #UnifiedAnalytics #SparkAISummit
  • 2. Contents 2#UnifiedAnalytics #SparkAISummit • Introduction to BISTel – BISTel’s business and solutions – Big Data for BISTel’s smart manufacturing • Use cases of Apache Spark in manufacturing industry – Trace Analyzer (TA) – Map Analyzer (MA)
  • 4. BISTel’s business areas • Providing analytic solutions based on Artificial Intelligence (AI) and Big Data to the customers for Smart Factory 4#UnifiedAnalytics #SparkAISummit
  • 5. BISTel’s solution areas • World-Class Manufacturing Intelligence through innovation 5#UnifiedAnalytics #SparkAISummit
  • 6. BISTel’s analytic solution: eDataLyzer 6#UnifiedAnalytics #SparkAISummit
  • 7. BISTel’s analytic solutions (MA) 7#UnifiedAnalytics #SparkAISummit • Map Pattern Clustering – Automatically detect and classify map patterns with/without libraries – Process thousands of wafers and give results in few minutes Clustered Defective wafers
  • 8. BISTel’s analytic solutions (TA) • Specialized Application for Trace Raw Data – Extracts the vital signs out of equipment trace data – Provide in-depth analysis which traditional methods cannot reach 8#UnifiedAnalytics #SparkAISummit Abnormal Normal
  • 9. BISTel’s big data experiences 9#UnifiedAnalytics #SparkAISummit
  • 10. BISTel’s big data experiences 10#UnifiedAnalytics #SparkAISummit - YMA Test using Spark - - Big data platforms comparison-
  • 12. Trace Data • Trace Data is sensor data collected from processing equipment within a semiconductor fab during a process run. 12#UnifiedAnalytics #SparkAISummit - Semiconductor industry - - Wafer -
  • 13. Logical Hierarchy of the trace data 13#UnifiedAnalytics #SparkAISummit Wafer Lot Recipe Step Recipe Process Visualization Whole process Process Recipe 1 wafer
  • 14. An example of the trace data 14#UnifiedAnalytics #SparkAISummit Process Recipe Recipe step Lot Wafer Param1 Param2 Time 021_LIT RecipeA 1 1501001 1 32.5 45.4 2015-01-20 09:00:00
  • 15. Data attributes • Base unit : one process and one parameters • 1000 wafers • Each wafer has 1000~2000 data points in a recipe step • Some factors that make trace data huge volume • # of parameters • # of processes • # of wafers • # of recipe steps • duration of the recipe step 15#UnifiedAnalytics #SparkAISummit
  • 16. An example of the trace data – (2) 16#UnifiedAnalytics #SparkAISummit No. Fab # of processes # of recipe steps Avg. Recipe ProcessTime Data Frequency # of units Parameter per unit (max) 1 Array 109 10 16 mins 1Hz 288 185 2 CF 25 5 1min 1Hz 154 340 3 CELL 12 7 1min 1Hz 213 326 4 MDL 5 12 2mins 1Hz 32 154 • Some calculations • For one process, one parameter and one wafer • 16 * 10 * 60 sec * 1Hz = 9600 points • Multi parameters, multi processes and multi wafers • 9600 * 288 *185 * 109 * (# of wafers)
  • 17. Spark : Smart manufacturing • Spark is a best way to process big data in batch analytics • Distributing data based on parameter is suitable for using Apache Spark. • Easy deployment and scalability when it comes to providing the solutions to our customers 17#UnifiedAnalytics #SparkAISummit
  • 19. How to apply Spark to TA? traceDataSet = config.getTraceRDDs().mapToPair(t->{ String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key return new Tuple2<String,String>(recipeStepKey,t); }).groupByKey(); traceDataSet.flatMap(t->{ Map<String,TraceDataSet> alltraceData = TAUtil.getTraceDataSet(t); ... TAUtil.seperateFocusNonFocus(alltraceData,focus,nonFocus); #separate data ta.runTraceAnalytic(focus,nonFocus,config); # calling the TA core ... });
  • 20. Most cases in manufacturing industry • In real industry, most parameters have small number of data points. (Most case : 1Hz) • In addition, the number of wafers to be analyzed is not massive. (up to 1,000 wafers) • Therefore the total number of data points in a process can be easily processed in a core
  • 21. Issues in manufacturing industry 21#UnifiedAnalytics #SparkAISummit • Last year, I have got an email indicating that..
  • 22. Big parameter 22#UnifiedAnalytics #SparkAISummit • Tools with high frequency or high recipe time can produce huge volume for single parameter • Requirements in industry • For one parameter • 400,000 wafers • 20,000 data points.
  • 23. Limitations of the Naïve TA 23#UnifiedAnalytics #SparkAISummit For(Tuple<String,Iterable<String> recipeTrace : allTraceData){ TraceDataSet ftds = new TraceDataSet(); Iterable<String> oneRecipe = recipeTrace._2(); for(String tr : oneRecipe){ TraceData td = TAUtil.convertToTraceData(tr); ftds.add(td); } } traceDataSet = config.getTraceRDDs().mapToPair(t->{ String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key return new Tuple2<String,String>(recipeStepKey,t); }).groupByKey(); All the data points based on the key are pushed into one core by shuffling Java object holds too many data points
  • 24. Needs for new TA spark 24#UnifiedAnalytics #SparkAISummit • Naïve TA Spark version cannot process massive data points. • Nowadays, new technology enhancements enable data capture at much higher frequencies. • TA for “big parameter” version is necessary.
  • 25. Our idea is that.. 25#UnifiedAnalytics #SparkAISummit • Extracting the TA core logic – Batch mode – Key-based processing – Using .collect() to broadcast variables – Caching the object
  • 26. • Preprocessing trace data • Key-based processing • Base unit : process key or recipe step key Batch 26#UnifiedAnalytics #SparkAISummit JavaPairRDD<String, List<String>> traceDataRDD = TAImpl.generateBatch(traceData) First element : process, recipe step, parameter and batch ID Second element : lot, wafer and trace values Summary statistics . . . •Param A
  • 27. Collect() : TA Cleaner 27#UnifiedAnalytics #SparkAISummit • Filtering out traces that have unusual duration of process time. • Use the three main Spark APIs – mapToPair : extract relevant information – reduceByKey : aggregating values based on the key – collect : send the data to the driver
  • 28. Collect() : TA Cleaner – (2) 28#UnifiedAnalytics #SparkAISummit Worker wafer value 1 65 2 54 … … Worker wafer value 1 83 2 54 … … Worker wafer value 1 34 2 77 … … Worker wafer value 1 71 2 80 … … • traceData.mapToPair() • Return • key : process • value : wafer and its length
  • 29. Collect() : TA Cleaner – (3) 29#UnifiedAnalytics #SparkAISummit • reduceByKey() • Aggregating contexts into one based on the process key wafer value 1 65 2 54 … … Shuffling wafer value 1 88 2 92 … … wafer value 1 153 2 146 … …
  • 30. Collect() : TA Cleaner – (4) 30#UnifiedAnalytics #SparkAISummit • Applying filtering method in each worker mapToPair(t -> { String pk = t._1(); Double[] values = toArray(t._2()); FilterThresdholds ft = CleanerFilter.filterByLength(values); return Tuple(pk,ft); }).collect();
  • 31. Examples 2 : Computing outlier 31#UnifiedAnalytics #SparkAISummit • To detect the outlier in a process, median statistics is required. • To compute the median value, the values need to be sorted. • Sort(values)
  • 32. Examples 2 : Computing outlier – (2) 32#UnifiedAnalytics #SparkAISummit mapToPair reduceByKey • Computed the approximate median value for big data processing. • Applied histogram for median • Collecting the histogram Collect
  • 33. Caching the trace data 33#UnifiedAnalytics #SparkAISummit • Persist the trace data before applying TA algorithm • Be able to prevent data load when the action is performed Focus=Focus.persist(StorageLevel.MEMORY() AND DISK()) NonFocus=NonFocus.persist(StorageLevel.MEMORY() AND DISK())
  • 34. RDD vs. DataSet (DataFrame) 34#UnifiedAnalytics #SparkAISummit • RDD – All the data points in a process should be scanned • Advantage of the DataSet is weakened. – Hard to manipulate trace data using SQL – Basic statistics (i.e. Min, Max, Avg, Count…) – Advanced algorithm (Fast Fourier Transform and Segmentation)
  • 35. Demo : Running the TA algorithm 35#UnifiedAnalytics #SparkAISummit • Analyzed 2TB trace data using TA
  • 36. TA results in eDataLyzer 36#UnifiedAnalytics #SparkAISummit
  • 37. Results of the Naïve TA 37#UnifiedAnalytics #SparkAISummit
  • 38. Results of the big parameter TA Spark 38#UnifiedAnalytics #SparkAISummit
  • 39. • Two different TA Spark versions Two different TA Spark versions 39#UnifiedAnalytics #SparkAISummit Data size # of parameter # of wafers # of data points Running Time Naïve TA 2TB 270,000 250 1000 1.1h Big Param TA 1TB 4 400,000 20,000 54min
  • 41. Map Analytics (MA) 41#UnifiedAnalytics #SparkAISummit • Hierarchical clustering is used to find a defect pattern S.-C. Hsu, C.-F. Chien / Int. J. Production Economics 107 (2007) 88–103
  • 42. MA datasets 42#UnifiedAnalytics #SparkAISummit Process Process step Parameter Lot Wafer Defective chips FPP Fall_bin P01 8152767 23 -02,04|- 01,22|+00,25|+08, 33|+04,05 waferDataSetRDD.mapToPair(...).groupBy().mapToPair(...); Generating a key value pair Calling hierarchical clustering
  • 43. BISTel’s first approach for MA 43#UnifiedAnalytics #SparkAISummit • Using the batch mode for clustering massive wafers.
  • 44. Demo : Running the MA algorithm 44#UnifiedAnalytics #SparkAISummit • Dataset consists of 26 parameters containing 120,000 wafers
  • 45. Problems in batch for clustering 45#UnifiedAnalytics #SparkAISummit • In a manufacturing industry, some issues exist # of wafers Time Detecting a pattern DataSet1 15 2017-02-01:09:00 ~ 09:30 Yes DataSet2 7,000 2017-02-01~2017-02-08 No
  • 46. Spark summit: SHCA algorithm 46#UnifiedAnalytics #SparkAISummit • In Spark Summit 2017, chen jin presented a scalable hierarchical clustering algorithm using Spark.
  • 47. A SHCA algorithm using Spark 47#UnifiedAnalytics #SparkAISummit Jin, Chen, et al. "A scalable hierarchical clustering algorithm using spark." 2015 IEEE First International Conference on Big Data Computing Service and Applications. IEEE, 2015.
  • 48. Applying SHCA to wafer datasets 48#UnifiedAnalytics #SparkAISummit Wafer map ID Coordinates of defective chips A (13,22), (13,23), (13,24), (13,25)… B (5,15), (6,12), (6,17), (8,25)… C (9,29), (16,33), (19,39), (22,25)… D (19,9), (20,2), (23,21), (25,4)… E (5,5), (5,8), (5,15), (5,25)… • Designed the key-value pairs • Minimum spanning tree (MST) – Vertex : Wafer – Edge : distance between wafers • distance w1, w2
  • 49. Comparison between two versions 49#UnifiedAnalytics #SparkAISummit
  • 50. Comparison between two versions - (2) 50#UnifiedAnalytics #SparkAISummit
  • 51. Spark stage results of MA 51#UnifiedAnalytics #SparkAISummit • Approximately 100,000 wafers are analyzed for clustering
  • 52. Comparison of the results 52#UnifiedAnalytics #SparkAISummit 0 500 1000 1500 2000 2500 5,000 50,000 100k 160k 320k Batch New MA
  • 53. Summary 53#UnifiedAnalytics #SparkAISummit • MA using SHCA is accurate than the batch MA. • However, the running time of the batch MA is faster than that of the new MA. • In manufacturing industry, we suggest them to use both of two MAs.
  • 54. Conclusions 54#UnifiedAnalytics #SparkAISummit • A first use case of Apache Spark in Semiconductor industry – Terabytes of trace data is processed – Achieved hierarchical clustering on distributed machines for semiconductor wafers
  • 55. Acknowledgements 55#UnifiedAnalytics #SparkAISummit • BISTel Korea (BK) – Andrew An • BISTel America (BA) – James Na – WeiDong Wang – Rachel Choi – Taeseok Choi – Mingyu Lu * This work was supported by the World Class 300 Project (R&D) (S2641209, "Development of next generation intelligent Smart manufacturing solution based on AI & Big data to improve manufacturing yield and productivity") of the MOTIE, MSS(Korea).
  • 56. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT