SlideShare uma empresa Scribd logo
1 de 17
Analyzing and Linking Big Data
      with Stratosphere
            Kostas Tzoumas




    kostas.tzoumas@tu-berlin.de
 www.user.tu-berlin.de/kostas.tzoumas/
Big Data
■ Driven by cheaper storage and computation
      □ Cloud computing further enabling economies of scale
      □ Open-source software lowers barrier of entry
■ Societal and economical impact
      □ Scientific breakthroughs will come from data exploration
      □ Success in business dictated by the ability to quickly draw
        insights from data signals
■ Big Data Analytics
      □ Analytical workloads scale inversely with cost
■ Major challenge: Information marketplaces
      □ Data as a resource, analytics as a product
      □ An “AppStore” for data
  2                                      Linking and Analyzing Big Data with Stratosphere
The DOPA Vision
■ Data Pools as collections of diverse data sets
■ Example data pools
      □ Evolving history of the Web
      □ Financial and statistical data
■ Data identification
      □ By assigning unique IDs to objects
      □ Enables data linkage
■ Data Analytics
      □ Integration and analytics on diverse data sources
      □ Using a common framework and language



  3                                      Linking and Analyzing Big Data with Stratosphere
Some Scenarios
■ Sentiment and market analysis
   □ An SME producing consumer goods can analyze blogs and
     social network streams, and link them with customer
     demographics to perform sentiment analysis and market
     research
■ Green houses
   □ A home buyer can find out the energy consumption and
     distribution over time in houses in a particular area by linking
     and analyzing energy with demographic data.
■ Traffic analysis, transportation and construction planning
   □ Linking weather, traffic, road data



                                      Linking and Analyzing Big Data with Stratosphere
Big Data Analytics
■ “Big Data” refers to different applications as well as
  more data
      □ Beyond traditional DW queries
■ Open-source in data management
      □ Enabled primarily by Hadoop popularity
      □ Changed research landscape
■ New systems are rethinking the complete data
  management stack at a massively parallel scale
      □ As systems mature, need to tackle hard and novel problems




  5                                     Linking and Analyzing Big Data with Stratosphere
StratoSphere
Above the Clouds




    STRATOSPHERE



6                  Linking and Analyzing Big Data with Stratosphere
Stratosphere
■ Collaborative Research Project
      □ 3 Universities, 5 research groups in the Berlin area
■ Infrastructure for Big Data Analytics
■ Bridge relational DBMSs and MapReduce worlds
      □ Intersection of functional languages and data parallelism
      □ Re-architecting data management systems for massive
        parallelism
■ Open-source research platform (Apache)
■ Used by a variety of Universities and research institutes



  7                                      Linking and Analyzing Big Data with Stratosphere
Stratosphere Architecture
                                                 $res =
                                                   filter $e in $emp
                                                   where
    Data   Data   Data                             $e.income > 30000;

    pool   pool   pool
                                                          Compiler



                    Query
                    Processor                         PACT Optimizer




                                                          Nephele
                    ...

8                               Linking and Analyzing Big Data with Stratosphere
2km resolution
                 Stratosphere Use Cases




                     10TB
1100km,




                     950km,
                     2km resolution




                 9                        Linking and Analyzing Big Data with Stratosphere
The Meteor Query Language
■ Stratosphere declarative front-end inspired by the IBM
  Jaql language
■ Extensible and flexible
      □ Easy to add libraries, e.g., for data linkage, cleansing, mining
      □ Easy to integrate in language syntax
■ Provides operators for Information Extraction and Data
  Linkage
■ Time as a first-class concept




 10                                       Linking and Analyzing Big Data with Stratosphere
The Nephele Execution Engine
■ Executes Nephele schedules
      □ DAGs of already parallelized operator instances
      □ Parallelization already done by PACT optimizer
■ Design decisions
      □ Designed to run on top of an IaaS cloud
      □ Predictable performance
      □ Scalability to 1000+ nodes with flexible fault-tolerance
■ Permits network, in-memory (both pipelined), file
  (materialization) channels




 11                                      Linking and Analyzing Big Data with Stratosphere
The PACT Programming Model
■ Internal Stratosphere programming model
      □ Also exposed to the programmer for advanced functionality
■ Dataflow, side-effect free programming model enabling
  massive parallelism
■ Centered around the concept of second-order functions
      □ Generalization of MapReduce

  Map PACT           Reduce PACT       Match PACT                  CoGroup PACT




 12                                   Linking and Analyzing Big Data with Stratosphere
Optimization
■ Knowledge of PACT signature permits automatic
  optimization ala Relational DBMSs
■ Emulates different hand-crafted MapReduce
  implementations

■ Enables orders of             Reduce (on tid)
                               ↑(pid=tid, r=∑ k)↑
                                                               Sum up
                                                             partial ranks
                                                                              Reduce (on tid)
                                                                             ↑(pid=tid, r=∑ k)↑

  magnitude faster                        fifo                                    part./sort (tid)


  programs                       Match (on pid)
                                 ↑(tid, k=r*p)↑             Join P and A
                                                                                Match (on pid)
                                                                                ↑(tid, k=r*p)↑

■ Frees programmer from   buildHT (pid)
                                             probeHT (pid)
                                                                        probeHT (pid)
                                                                                             buildHT (pid)
  thinking about             broadcast           part./sort (tid)        partition (pid)      partition (pid)

  execution                      p                    A                          p                   A
                              (pid, r )          (pid, tid, p)                (pid, r )         (pid, tid, p)



 13                                  Linking and Analyzing Big Data with Stratosphere
Ongoing Research

            Sink
                                                 O
                                        Reduce (on tid)  Sum up
                                      ↑(pid=tid, r=∑ k)↑ partial ranks
            UDF                              fifo
           Match
                                        Match (on pid)
                                        ↑(tid, k=r*p)↑        Join P and A

                    UDF                           probeHT (pid)
                   Reduce        buildHT (pid)
                                                     CACHE
                                    broadcast       part./sort (tid)
     UDF            UDF
                                                          A    (pid, tid, p)
     Map            Map
                                         I
                                       fifo

     Src             Src
      1               2                 p     (pid, r )


14                          Linking and Analyzing Big Data with Stratosphere
Summary
■ DOPA
      □ Bootstrapping the information economy by providing
        information marketplaces and related business models
      □ Brings together heterogeneous data pools
      □ Enables easy linkage and analytics across data pools via a
        flexible programming language
■ Stratosphere
      □ Technical infrastructure for scalable analytics
      □ Pushes the MapReduce paradigm forward
      □ Focal point of several research initiatives across Europe and
        the world


 15                                      Linking and Analyzing Big Data with Stratosphere
Acknowledgments
■ FP7 STREP (DOPA), DFG FOR, EIT (Stratosphere)
■ DOPA partners
      □ TU Berlin, IMR, DataMarket, OKKAM, Vico, ami
■ Stratosphere partners
      □ TU Berlin, HU Berlin, HPI
■ EIT partners
      □ TU Berlin, SICS, TU Delft, Inria, U. Trento, Aalto U., STACZKI




 16                                       Linking and Analyzing Big Data with Stratosphere
Thank you!
         www.stratosphere.eu
          @stratosphere_eu
     kostas.tzoumas@tu-berlin.de


17                  Linking and Analyzing Big Data with Stratosphere

Mais conteúdo relacionado

Mais procurados

Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with Ramsantac
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftAmazon Web Services
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Sumeet Singh
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence GeneratorRim Moussa
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce Fabio Fumarola
 
TGS GPS- Russian well database
TGS GPS- Russian well database TGS GPS- Russian well database
TGS GPS- Russian well database TGS
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiersRim Moussa
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedDataWorks Summit
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 

Mais procurados (20)

Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
 
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon RedshiftBest Practices for Migrating Legacy Data Warehouses into Amazon Redshift
Best Practices for Migrating Legacy Data Warehouses into Amazon Redshift
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
 
Unit3 slides
Unit3 slidesUnit3 slides
Unit3 slides
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
 
TGS GPS- Russian well database
TGS GPS- Russian well database TGS GPS- Russian well database
TGS GPS- Russian well database
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiers
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 
Reading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDMReading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDM
 
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons LearnedHadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
Hadoop for High-Performance Climate Analytics - Use Cases and Lessons Learned
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 

Destaque

EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012  Simon Riggs - Open Data, Open Database: PostgreSQLEDF2012  Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQLEuropean Data Forum
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...European Data Forum
 
EDF2012 Simon Redfern - Open Bank Project
EDF2012   Simon Redfern - Open Bank ProjectEDF2012   Simon Redfern - Open Bank Project
EDF2012 Simon Redfern - Open Bank ProjectEuropean Data Forum
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - FactforgeEuropean Data Forum
 
EDF2012 Andrew Farrow - (Copy)right information in the digital age
EDF2012   Andrew Farrow - (Copy)right information in the digital ageEDF2012   Andrew Farrow - (Copy)right information in the digital age
EDF2012 Andrew Farrow - (Copy)right information in the digital ageEuropean Data Forum
 
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012   Florian Bauer - Using LOD to share clean energy data and knowledgeEDF2012   Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledgeEuropean Data Forum
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was builtEuropean Data Forum
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 
EDF2012 Phil Archer - Enabling Open Data Interoperability
EDF2012   Phil Archer - Enabling Open Data InteroperabilityEDF2012   Phil Archer - Enabling Open Data Interoperability
EDF2012 Phil Archer - Enabling Open Data InteroperabilityEuropean Data Forum
 

Destaque (9)

EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012  Simon Riggs - Open Data, Open Database: PostgreSQLEDF2012  Simon Riggs - Open Data, Open Database: PostgreSQL
EDF2012 Simon Riggs - Open Data, Open Database: PostgreSQL
 
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...EDF2012   Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
EDF2012 Irini Fundulaki - Abstract Access Control Models for Dynamic RDF Da...
 
EDF2012 Simon Redfern - Open Bank Project
EDF2012   Simon Redfern - Open Bank ProjectEDF2012   Simon Redfern - Open Bank Project
EDF2012 Simon Redfern - Open Bank Project
 
EDF2012 Mariana Damova - Factforge
EDF2012   Mariana Damova - FactforgeEDF2012   Mariana Damova - Factforge
EDF2012 Mariana Damova - Factforge
 
EDF2012 Andrew Farrow - (Copy)right information in the digital age
EDF2012   Andrew Farrow - (Copy)right information in the digital ageEDF2012   Andrew Farrow - (Copy)right information in the digital age
EDF2012 Andrew Farrow - (Copy)right information in the digital age
 
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012   Florian Bauer - Using LOD to share clean energy data and knowledgeEDF2012   Florian Bauer - Using LOD to share clean energy data and knowledge
EDF2012 Florian Bauer - Using LOD to share clean energy data and knowledge
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 
EDF2012 Phil Archer - Enabling Open Data Interoperability
EDF2012   Phil Archer - Enabling Open Data InteroperabilityEDF2012   Phil Archer - Enabling Open Data Interoperability
EDF2012 Phil Archer - Enabling Open Data Interoperability
 

Semelhante a EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere

TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introductionHektor Jacynycz García
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and RRadek Maciaszek
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with ScalaChetan Khatri
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormDataStax
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriChetan Khatri
 

Semelhante a EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere (20)

Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Data Science
Data ScienceData Science
Data Science
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 

Mais de European Data Forum

EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...European Data Forum
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...European Data Forum
 
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...European Data Forum
 
EDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro PresentationEDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro PresentationEuropean Data Forum
 
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...European Data Forum
 
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...European Data Forum
 
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...European Data Forum
 
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...European Data Forum
 
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...European Data Forum
 
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...European Data Forum
 
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...European Data Forum
 
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...European Data Forum
 
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...European Data Forum
 
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...European Data Forum
 
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...European Data Forum
 
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...European Data Forum
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...European Data Forum
 
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...European Data Forum
 
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...European Data Forum
 

Mais de European Data Forum (20)

EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
EDF2014: Ralf-Peter Schaefer, Head of Traffic Product Unit, TomTom, Germany: ...
 
Barbato leit ict 15-16-17
Barbato leit ict 15-16-17Barbato leit ict 15-16-17
Barbato leit ict 15-16-17
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
EDF2014: BIG - NESSI Networking Session: Nuria de Lama, Representative to the...
 
EDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro PresentationEDF2014: BIG - NESSI Networking Session: Intro Presentation
EDF2014: BIG - NESSI Networking Session: Intro Presentation
 
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
EDF2014: Kush Wadhwa, Senior Partner, Trilateral Research & Consulting: Addre...
 
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
EDF2014: Adrian Cristal, Barcelona Supercomputing Center, RETHINK big Project...
 
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
 
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
EDF2014: Rüdiger Eichin, Research Manager at SAP AG, Germany: Deriving Value ...
 
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
 
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
EDF2014: Christian Lindemann, Wolters Kluwer Germany & Christian Dirschl, Wol...
 
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
EDF2014: Marta Nagy-Rothengass, Head of Unit Data Value Chain, Directorate Ge...
 
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
EDF2014: Stefan Wrobel, Institute Director, Fraunhofer IAIS / Member of the b...
 
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
EDF2014: Michele Vescovi, Researcher, Semantic & Knowledge Innovation Lab, It...
 
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
EDF2014: Allan Hanbury, Senior Researcher, Vienna University of Technology, A...
 
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
EDF2014: Nikolaos Loutas, Manager at PwC Belgium, Business Models for Linked ...
 
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
EDF2014: Vedran Sabol, Head of the Knowledge Visualisation Area, Know-Center,...
 
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
EDF2014: Daniel Vila-Suero, Researcher, Ontology Engineering Group, Universid...
 
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
 
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
EDF2014: Taru Rastas, Senior Advisor, Ministry of Communications of Finland: ...
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere

  • 1. Analyzing and Linking Big Data with Stratosphere Kostas Tzoumas kostas.tzoumas@tu-berlin.de www.user.tu-berlin.de/kostas.tzoumas/
  • 2. Big Data ■ Driven by cheaper storage and computation □ Cloud computing further enabling economies of scale □ Open-source software lowers barrier of entry ■ Societal and economical impact □ Scientific breakthroughs will come from data exploration □ Success in business dictated by the ability to quickly draw insights from data signals ■ Big Data Analytics □ Analytical workloads scale inversely with cost ■ Major challenge: Information marketplaces □ Data as a resource, analytics as a product □ An “AppStore” for data 2 Linking and Analyzing Big Data with Stratosphere
  • 3. The DOPA Vision ■ Data Pools as collections of diverse data sets ■ Example data pools □ Evolving history of the Web □ Financial and statistical data ■ Data identification □ By assigning unique IDs to objects □ Enables data linkage ■ Data Analytics □ Integration and analytics on diverse data sources □ Using a common framework and language 3 Linking and Analyzing Big Data with Stratosphere
  • 4. Some Scenarios ■ Sentiment and market analysis □ An SME producing consumer goods can analyze blogs and social network streams, and link them with customer demographics to perform sentiment analysis and market research ■ Green houses □ A home buyer can find out the energy consumption and distribution over time in houses in a particular area by linking and analyzing energy with demographic data. ■ Traffic analysis, transportation and construction planning □ Linking weather, traffic, road data Linking and Analyzing Big Data with Stratosphere
  • 5. Big Data Analytics ■ “Big Data” refers to different applications as well as more data □ Beyond traditional DW queries ■ Open-source in data management □ Enabled primarily by Hadoop popularity □ Changed research landscape ■ New systems are rethinking the complete data management stack at a massively parallel scale □ As systems mature, need to tackle hard and novel problems 5 Linking and Analyzing Big Data with Stratosphere
  • 6. StratoSphere Above the Clouds STRATOSPHERE 6 Linking and Analyzing Big Data with Stratosphere
  • 7. Stratosphere ■ Collaborative Research Project □ 3 Universities, 5 research groups in the Berlin area ■ Infrastructure for Big Data Analytics ■ Bridge relational DBMSs and MapReduce worlds □ Intersection of functional languages and data parallelism □ Re-architecting data management systems for massive parallelism ■ Open-source research platform (Apache) ■ Used by a variety of Universities and research institutes 7 Linking and Analyzing Big Data with Stratosphere
  • 8. Stratosphere Architecture $res = filter $e in $emp where Data Data Data $e.income > 30000; pool pool pool Compiler Query Processor PACT Optimizer Nephele ... 8 Linking and Analyzing Big Data with Stratosphere
  • 9. 2km resolution Stratosphere Use Cases 10TB 1100km, 950km, 2km resolution 9 Linking and Analyzing Big Data with Stratosphere
  • 10. The Meteor Query Language ■ Stratosphere declarative front-end inspired by the IBM Jaql language ■ Extensible and flexible □ Easy to add libraries, e.g., for data linkage, cleansing, mining □ Easy to integrate in language syntax ■ Provides operators for Information Extraction and Data Linkage ■ Time as a first-class concept 10 Linking and Analyzing Big Data with Stratosphere
  • 11. The Nephele Execution Engine ■ Executes Nephele schedules □ DAGs of already parallelized operator instances □ Parallelization already done by PACT optimizer ■ Design decisions □ Designed to run on top of an IaaS cloud □ Predictable performance □ Scalability to 1000+ nodes with flexible fault-tolerance ■ Permits network, in-memory (both pipelined), file (materialization) channels 11 Linking and Analyzing Big Data with Stratosphere
  • 12. The PACT Programming Model ■ Internal Stratosphere programming model □ Also exposed to the programmer for advanced functionality ■ Dataflow, side-effect free programming model enabling massive parallelism ■ Centered around the concept of second-order functions □ Generalization of MapReduce Map PACT Reduce PACT Match PACT CoGroup PACT 12 Linking and Analyzing Big Data with Stratosphere
  • 13. Optimization ■ Knowledge of PACT signature permits automatic optimization ala Relational DBMSs ■ Emulates different hand-crafted MapReduce implementations ■ Enables orders of Reduce (on tid) ↑(pid=tid, r=∑ k)↑ Sum up partial ranks Reduce (on tid) ↑(pid=tid, r=∑ k)↑ magnitude faster fifo part./sort (tid) programs Match (on pid) ↑(tid, k=r*p)↑ Join P and A Match (on pid) ↑(tid, k=r*p)↑ ■ Frees programmer from buildHT (pid) probeHT (pid) probeHT (pid) buildHT (pid) thinking about broadcast part./sort (tid) partition (pid) partition (pid) execution p A p A (pid, r ) (pid, tid, p) (pid, r ) (pid, tid, p) 13 Linking and Analyzing Big Data with Stratosphere
  • 14. Ongoing Research Sink O Reduce (on tid) Sum up ↑(pid=tid, r=∑ k)↑ partial ranks UDF fifo Match Match (on pid) ↑(tid, k=r*p)↑ Join P and A UDF probeHT (pid) Reduce buildHT (pid) CACHE broadcast part./sort (tid) UDF UDF A (pid, tid, p) Map Map I fifo Src Src 1 2 p (pid, r ) 14 Linking and Analyzing Big Data with Stratosphere
  • 15. Summary ■ DOPA □ Bootstrapping the information economy by providing information marketplaces and related business models □ Brings together heterogeneous data pools □ Enables easy linkage and analytics across data pools via a flexible programming language ■ Stratosphere □ Technical infrastructure for scalable analytics □ Pushes the MapReduce paradigm forward □ Focal point of several research initiatives across Europe and the world 15 Linking and Analyzing Big Data with Stratosphere
  • 16. Acknowledgments ■ FP7 STREP (DOPA), DFG FOR, EIT (Stratosphere) ■ DOPA partners □ TU Berlin, IMR, DataMarket, OKKAM, Vico, ami ■ Stratosphere partners □ TU Berlin, HU Berlin, HPI ■ EIT partners □ TU Berlin, SICS, TU Delft, Inria, U. Trento, Aalto U., STACZKI 16 Linking and Analyzing Big Data with Stratosphere
  • 17. Thank you! www.stratosphere.eu @stratosphere_eu kostas.tzoumas@tu-berlin.de 17 Linking and Analyzing Big Data with Stratosphere