SlideShare uma empresa Scribd logo
1 de 22
Data Munging and Analysis for
Scientific Applications
Raminder Singh
Science Gateways Group
Indiana University, Bloomington
raminder@apache.org
• Evaluate the Apache Big data tools
• Understand the execution patterns of Analysis
applications
• Solutions using Airavata
• Build a gateways solution with HPC and Big Data
requirements
Overview
http://hortonworks.com/hadoop/yarn/
Hadoop 2 Ecosystem
Motivation to explore
• Heterogeneous data
• Data Munging (parsing, scraping, formatting data)
• Visualization or Analyze
• Preservation of data
Analysis Applications
• Behavior Tracking - medical
• Situational Awareness - weather
• Time Series Data -Patient monitoring, weather data to help farmers
• Resource consumption Monitoring - Smart grid
• Process optimization
What is Science Gateway?
• Community portal or desktop tools
• Common science theme
• Collaborative environment
The Ultrascan science gateway supports
high performance computing analysis of
biophysics experiments using XSEDE,
Juelich, and campus clusters.
Desktop analysis tools
Launch analysis
and monitor
through a
browser
We help build gateways
for labs or facilities.
Airavata
Value of using Airavata
• Enable collection of resources
• Application centric not compute centric
• Meta workflow to enable set of applications
Use-case for Data Analysis
• TextRWeb: Large Scale Text Analytics with
R on the web
Collaborator: Hui Zhang, Data Scientist at Indiana University
Goals for R on the web project
• Run large scale text analysis using parallel
R.
• Hide computational complexity with user
interfaces
• Support interactive text analysis
• Support iterative text mining
TextR Solution Diagram
Future Work
• Integrate TextRWeb with Apache Spark
• Explore SparkR [1]
• Develop Apache Thrift interfaces for TextRWeb server
• Integrate with Apache Airavata for HPC job.
• Explore workflow DAGs for Text Analysis
• Keep updated with product offering like Stratosphere
1. https://github.com/amplab-extras/SparkR-pkg
Conclusion
• Value added for the scientific communities
• Value for Apache Big Data Suite
airavata.apache.org
Subscribe: users-subscribe@airavata.apache.org
Subscribe: dev-subscribe@airavata.apache.org
Subscribe: architecture-subscribe@airavata.apache.org
Thanks You!
Q & A
Apache Spark
• In Memory computations
• Machine learning library (MLLib)
• graph engine (GraphX)
• Streaming analytics engine (Spark Streaming)
• Fast interactive query tool (Shark).
• Use Lineage data for fault tolerance
• Tracking the data path
Current Hadoop Integration
Scientific applications Data Types
Observational Data – uncontrolled events happen and we record data about them.
Examples include astronomy, earth observation, geophysics, medicine, commerce, social data, the
internet of things.
Experimental Data – we design controlled events for the purpose of recording data about them.
Examples include particle physics, photon sources, neutron sources, bioinformatics, product
development.
Simulation Data – we create a model, simulate something, and record the resulting data.
Examples include weather & climate, nuclear & fusion energy, high-energy physics, materials,
chemistry, biology, fluid dynamics.
BioVLAB

Mais conteúdo relacionado

Mais procurados

grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
Intake at AnacondaCon
Intake at AnacondaConIntake at AnacondaCon
Intake at AnacondaConMartin Durant
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Databricks
 
BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingDavid Lauzon
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Analytics graph databases
Analytics graph databasesAnalytics graph databases
Analytics graph databasesSteve Sarsfield
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Mobcoder
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is bestlucenerevolution
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteDatabricks
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer spaceGraphAware
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKAbhi Jit
 
Hunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big DataHunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big DataSplunk
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSudipta Mukherjee
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 
An Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart CommunitiesAn Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart CommunitiesSteve Ray
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
 

Mais procurados (20)

grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Intake at AnacondaCon
Intake at AnacondaConIntake at AnacondaCon
Intake at AnacondaCon
 
Research Plan 2014
Research Plan 2014Research Plan 2014
Research Plan 2014
 
Elastic search
Elastic searchElastic search
Elastic search
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
 
BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 Debriefing
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Analytics graph databases
Analytics graph databasesAnalytics graph databases
Analytics graph databases
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best2013 11-07 lsr-dublin_m_hausenblas_when solr is best
2013 11-07 lsr-dublin_m_hausenblas_when solr is best
 
Spark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynoteSpark Summit EU 2015: Matei Zaharia keynote
Spark Summit EU 2015: Matei Zaharia keynote
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Hunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big DataHunk - Unlocking the Power of Big Data
Hunk - Unlocking the Power of Big Data
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for All
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 
An Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart CommunitiesAn Ontology-Driven Integration Framework for Smart Communities
An Ontology-Driven Integration Framework for Smart Communities
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 

Destaque

Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryKarissa Rae McKelvey
 
Europeana cloud wp6 breakout seession - athens 2014
Europeana cloud   wp6 breakout seession - athens 2014Europeana cloud   wp6 breakout seession - athens 2014
Europeana cloud wp6 breakout seession - athens 2014Europeana
 
Présentation EPMI Management Participatif David Boisseleau
Présentation EPMI Management Participatif David BoisseleauPrésentation EPMI Management Participatif David Boisseleau
Présentation EPMI Management Participatif David Boisseleaudboisseleau
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 

Destaque (6)

Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media Observatory
 
Europeana cloud wp6 breakout seession - athens 2014
Europeana cloud   wp6 breakout seession - athens 2014Europeana cloud   wp6 breakout seession - athens 2014
Europeana cloud wp6 breakout seession - athens 2014
 
Infovis & Music
Infovis & MusicInfovis & Music
Infovis & Music
 
Présentation EPMI Management Participatif David Boisseleau
Présentation EPMI Management Participatif David BoisseleauPrésentation EPMI Management Participatif David Boisseleau
Présentation EPMI Management Participatif David Boisseleau
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Semelhante a Data munging and analysis

Scientific
Scientific Scientific
Scientific marpierc
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsJohn Evans
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Shadab Ali Khan
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxVanshGupta597842
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesIsuru Suriarachchi
 

Semelhante a Data munging and analysis (20)

Scientific
Scientific Scientific
Scientific
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
 
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
IOT.ppt
IOT.pptIOT.ppt
IOT.ppt
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data LakesCrossing Analytics Systems: Case for Integrated Provenance in Data Lakes
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
 

Último

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Último (20)

Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Data munging and analysis

  • 1. Data Munging and Analysis for Scientific Applications Raminder Singh Science Gateways Group Indiana University, Bloomington raminder@apache.org
  • 2. • Evaluate the Apache Big data tools • Understand the execution patterns of Analysis applications • Solutions using Airavata • Build a gateways solution with HPC and Big Data requirements Overview
  • 4. Motivation to explore • Heterogeneous data • Data Munging (parsing, scraping, formatting data) • Visualization or Analyze • Preservation of data
  • 5. Analysis Applications • Behavior Tracking - medical • Situational Awareness - weather • Time Series Data -Patient monitoring, weather data to help farmers • Resource consumption Monitoring - Smart grid • Process optimization
  • 6. What is Science Gateway? • Community portal or desktop tools • Common science theme • Collaborative environment
  • 7. The Ultrascan science gateway supports high performance computing analysis of biophysics experiments using XSEDE, Juelich, and campus clusters. Desktop analysis tools Launch analysis and monitor through a browser We help build gateways for labs or facilities.
  • 9. Value of using Airavata • Enable collection of resources • Application centric not compute centric • Meta workflow to enable set of applications
  • 10. Use-case for Data Analysis • TextRWeb: Large Scale Text Analytics with R on the web Collaborator: Hui Zhang, Data Scientist at Indiana University
  • 11.
  • 12. Goals for R on the web project • Run large scale text analysis using parallel R. • Hide computational complexity with user interfaces • Support interactive text analysis • Support iterative text mining
  • 14. Future Work • Integrate TextRWeb with Apache Spark • Explore SparkR [1] • Develop Apache Thrift interfaces for TextRWeb server • Integrate with Apache Airavata for HPC job. • Explore workflow DAGs for Text Analysis • Keep updated with product offering like Stratosphere 1. https://github.com/amplab-extras/SparkR-pkg
  • 15.
  • 16. Conclusion • Value added for the scientific communities • Value for Apache Big Data Suite
  • 18. Apache Spark • In Memory computations • Machine learning library (MLLib) • graph engine (GraphX) • Streaming analytics engine (Spark Streaming) • Fast interactive query tool (Shark). • Use Lineage data for fault tolerance • Tracking the data path
  • 20. Scientific applications Data Types Observational Data – uncontrolled events happen and we record data about them. Examples include astronomy, earth observation, geophysics, medicine, commerce, social data, the internet of things. Experimental Data – we design controlled events for the purpose of recording data about them. Examples include particle physics, photon sources, neutron sources, bioinformatics, product development. Simulation Data – we create a model, simulate something, and record the resulting data. Examples include weather & climate, nuclear & fusion energy, high-energy physics, materials, chemistry, biology, fluid dynamics.
  • 21.