SlideShare uma empresa Scribd logo
1 de 107
Visual Data Analytics in the Cloud
for Exploratory Science
Bill Howe, UW
QuickTime™ and a
decompressor
are needed to see this picture.
Huy Vo, Utah
Claudio Silva, Utah
Juliana Freire, Utah
YingYi Bu, UW
3/12/09 Bill Howe, UW 2VisTrails + GridFields
Data acquisition is no longer the bottleneck
Old model: “Query the world” (Data acquisition coupled to a specific hypothesis)
New model: “Download the world” (Data acquired en masse, in support of many hypotheses)
 Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)
 Oceanography: high-resolution models, cheap sensors, satellites
 Biology: lab automation, high-throughput sequencing,
3/12/09 Bill Howe, UW 3VisTrails + GridFields
Biology
Oceanography
Astronomy
Two dimensions#ofbytes
# of apps
LSST
SDSS
Galaxy
BioMart
GEO
IOOS
OOI
LANL
HIVPathway
Commons
PanSTARRS
3/12/09 Bill Howe, UW 4VisTrails + GridFields
This Talk
 # of Bytes: MapReduce for Scientific Viz
 # of Apps: Other VDA Projects
3/12/09 Bill Howe, UW 5VisTrails + GridFields
Converging Requirements
Vis DB
3/12/09 Bill Howe, UW 6VisTrails + GridFields
Why Vis Needs DB
“Transferring the whole data generated … to a storage device or a visualization
machine could become a serious bottleneck, because I/O would take most of the …
time. A more feasible approach is to reduce and prepare the data in situ for
subsequent visualization and data analysis tasks.”
-- SciDAC Review
Current Research Topics in Vis:
• “Query-driven Visualization”
• “In Situ Visualization”
• “Remote Visualization”
3/12/09 Bill Howe, UW 7VisTrails + GridFields
Why DB Needs Vis
3/12/09 Bill Howe, UW 8VisTrails + GridFields
Why DB Needs Vis (2)
“What does the salt wedge look like?”
3/12/09 Bill Howe, UW 9VisTrails + GridFields
Thesis
 We can no longer afford to build separate
visualization and data management systems
 Data is increasingly destined for the cloud
 First Attack: Implement Vis primitives in an
existing “cloud” DM system
3/12/09 Bill Howe, UW 10VisTrails + GridFields
Core Vis Algorithms in MapReduce
 Scalar/Volume Rendering
 Isosurface Extraction
 Mesh Simplification
3/12/09 Bill Howe, UW 11VisTrails + GridFields
Some distributed algorithm…
Map
(Shuffle)
Reduce
3/12/09 Bill Howe, UW 12VisTrails + GridFields
CluE Cluster
 410 nodes
 Dual Intel Xeon 2.8GHz, hyperthreading
 8GB main memory each
 Hadoop, no access to OS
 Google provided, IBM maintaine, NSF
funded
3/12/09 Bill Howe, UW 13VisTrails + GridFields
CluE Cluster Scaling
3/12/09 Bill Howe, UW 14VisTrails + GridFields
Isosurface Example
3/12/09 Bill Howe, UW 15VisTrails + GridFields
Isosurface Example
3/12/09 Bill Howe, UW 16VisTrails + GridFields
Isosurface Example
3/12/09 Bill Howe, UW 17VisTrails + GridFields
Isosurface Example
3/12/09 Bill Howe, UW 18VisTrails + GridFields
Isosurface Extraction
3/12/09 Bill Howe, UW 19VisTrails + GridFields
Isosurface Extraction
3/12/09 Bill Howe, UW 20VisTrails + GridFields
Isosurface Results
O(N2
)O(N)
3/12/09 Bill Howe, UW 21VisTrails + GridFields
Scalable Rendering
3/12/09 Bill Howe, UW 22VisTrails + GridFields
Scalable Rendering
 Left: Atlas
 18GB
 500M triangles
 Right: St. Matthew
 13GB
 372M triangles
 Laser Scans, Digital
Michelandgelo project
srrc: Digital Michelangelo project
3/12/09 Bill Howe, UW 23VisTrails + GridFields
Rendering Results
3/12/09 Bill Howe, UW 24VisTrails + GridFields
Roadmap
 # of Bytes: MapReduce for Scientific Viz
 # of Apps: Other VDA projects
 Azure Ocean
 SQLShare
 Automating Mashups
3/12/09 Bill Howe, UW 25VisTrails + GridFields
[John Delaney, University of Washington]
3/12/09 Bill Howe, UW 26VisTrails + GridFields
Azure OceanAzure Ocean
COVE for
Visualization
Trident for
Processing
Azure for
Data+ +
3/12/09 Bill Howe, UW 27VisTrails + GridFields
SQLShare: Query Services
for Ad Hoc Research Data
3/12/09 Bill Howe, UW 28VisTrails + GridFields
Ad Hoc Research Data
5/18/10 Garret Cole, eScience Institute
Fasta format
Spread sheets
Tabular data
3/12/09 Bill Howe, UW 29VisTrails + GridFields5/18/10 Garret Cole, eScience Institute
Problem
“I spend 90% of my time handling
data rather than doing science”
-- Robin Kodner, Postdoc, Armbrust Lab
3/12/09 Bill Howe, UW 30VisTrails + GridFields
An observation about “handling data”
 How often does each RNA hit appear inside my
annotated surface group?
 SELECT hit, COUNT(*) as cnt FROM tigrfamannotation_surface
GROUP BY hit ORDER BY cnt DESC
5/18/10 Garret Cole, eScience Institute
3/12/09 Bill Howe, UW 31VisTrails + GridFields 31
Discovery: SQL Does not Terrify Scientists
5/18/10 Garret Cole, eScience Institute
3/12/09 Bill Howe, UW 32VisTrails + GridFields
3/12/09 Bill Howe, UW 33VisTrails + GridFields5/18/10 Garret Cole, eScience Institute
Technology used in 1st
Gen
Component Stack
3/12/09 Bill Howe, UW 34VisTrails + GridFields
SQLShare Redux
 Conventional wisdom says “Scientists won’t write SQL”
 We don’t believe it!
 Instead, we implicate difficulty in
 installation
 configuration
 schema design
 performance tuning
 data ingest
 over-reliance on GUIs
 Critical need for visualization
 Clear role for Tableau!
We are asking “What kind of platform will
make SQL useful for scientific inquiry?”
3/12/09 Bill Howe, UW 35VisTrails + GridFields
Automating Mashups
3/12/09 Bill Howe, UW 36VisTrails + GridFields
Why Mashups?
 Jim Gray: # of datasets scales as N2
 Each pairwise comparison generates a new dataset
 Corollary: # of apps scales as N2
 Every pairwise comparison motivates a new mashup
 To keep up, we need to
 entrain new programmers,
 make existing programmers more productive,
 or both
3/12/09 Bill Howe, UW 37VisTrails + GridFields
Satellite Images + Crime Incidence Reports
3/12/09 Bill Howe, UW 38VisTrails + GridFields
Twitter Feed + Flickr Stream
3/12/09 Bill Howe, UW 39VisTrails + GridFields
Why Mashups?
 The time of one’s data fitting into a 15 page research paper is past.
 Datasets are too large and complex to be conveyed with a handful
of static images
 Prediction: succinct, targeted, interactive web apps will become the
currency of scientific communication
 with the public
 with policy makers
 with colleagues in other disciplines
 with peers
 with students (K12 - grad)
3/12/09 Bill Howe, UW 40VisTrails + GridFields
Tableau
Mashups
3/12/09 Bill Howe, UW 41VisTrails + GridFields
Conclusions
 Converging requirements for DB and Vis
 At high scale:
 A Vis library in MapReduce
 At high complexity:
 Azure Ocean

Data + Workflow + Vis

“Client + Cloud”,“Computational mobility”
 SQLShare

Ad Hoc data -- “anything goes”

Visualization critical
 (semi-)automated mashups

“Show me what’s interesting”
3/12/09 Bill Howe, UW 42VisTrails + GridFields
Acknowledgments
http://escience.washington.edu
3/12/09 Bill Howe, UW 43VisTrails + GridFields
BACKUP SLIDES
3/12/09 Bill Howe, UW 44VisTrails + GridFields
[John Delaney, University of Washington]
3/12/09 Bill Howe, UW 45VisTrails + GridFields
3/12/09 Bill Howe, UW 46VisTrails + GridFields
John Delaney
3/12/09 Bill Howe, UW 47VisTrails + GridFields
Azure OceanAzure Ocean
COVE for
Visualization
Trident for
Processing
Azure for
Data+ +
COVECOVE
 Research into new interfaces for cross-disciplinary ocean scienceResearch into new interfaces for cross-disciplinary ocean science
 Extensive instrument and cable layout for creating experimentsExtensive instrument and cable layout for creating experiments
 Flexible terrain and image engine for visualizing siteFlexible terrain and image engine for visualizing site
 True 3D/4D science dataset visualizationTrue 3D/4D science dataset visualization
 Field tested in RSN observatory layout and on ocean expeditionsField tested in RSN observatory layout and on ocean expeditions
 Cross platform and extensible with python and workflow systemsCross platform and extensible with python and workflow systems
3/12/09 Bill Howe, UW 49VisTrails + GridFields
TridentTrident
 Microsoft Research scientific workflow systemMicrosoft Research scientific workflow system
 Visual programming environment for connecting tasksVisual programming environment for connecting tasks
 Science-specific task libraries including one for ocean sciencesScience-specific task libraries including one for ocean sciences
 Automated provenance capture, monitoring, and fault toleranceAutomated provenance capture, monitoring, and fault tolerance
 Runs on local system, Windows server, or HPC ClusterRuns on local system, Windows server, or HPC Cluster
 Cross platform with Silverlight and web service interfaceCross platform with Silverlight and web service interface
3/12/09 Bill Howe, UW 50VisTrails + GridFields
AzureAzure
 Microsoft’s cloud computing platformMicrosoft’s cloud computing platform
 Provides storage and computing as pay-as-you-go servicesProvides storage and computing as pay-as-you-go services
 From development standpoint, system looks like provisioned VM’sFrom development standpoint, system looks like provisioned VM’s
 SQL, table, and blob (file system) storage models are includedSQL, table, and blob (file system) storage models are included
 Access to storage via RESTful HTTP interfaceAccess to storage via RESTful HTTP interface
3/12/09 Bill Howe, UW 51VisTrails + GridFields
Azure OceanAzure Ocean
 COVE + Trident + Azure provides visual analytics to scientistsCOVE + Trident + Azure provides visual analytics to scientists
 Any component –Any component – VisualizationVisualization,, ComputingComputing, or, or DataData –– can becan be
provisioned locally, on a server, or in the cloudprovisioned locally, on a server, or in the cloud
 When on same machine, system APIs are leveraged for speedWhen on same machine, system APIs are leveraged for speed
 When distributed, communication is through HTTP and RESTful APIsWhen distributed, communication is through HTTP and RESTful APIs
 Flexible platform for the diverse ocean science needsFlexible platform for the diverse ocean science needs
3/12/09 Bill Howe, UW 52VisTrails + GridFields
3/12/09 Bill Howe, UW 53VisTrails + GridFields
MapReduce Programming Model
 Input & Output: each a set of key/value pairs
 Programmer specifies two functions:
 Processes input key/value pair
 Produces set of intermediate pairs
 Combines all intermediate values for a particular key
 Produces a set of merged output values (usually just one)
map (in_key, in_value) -> list(out_key, intermediate_value)
reduce (out_key, list(intermediate_value)) -> list(out_value)
slide source: Google, Inc.
3/12/09 Bill Howe, UW 54VisTrails + GridFields
Isosurface Example
3/12/09 Bill Howe, UW 55VisTrails + GridFields
Isosurface Example
<Vis movie>QuickTime™ and a
decompressor
are needed to see this picture.
Key idea: Zooplankton correlated with temperature
3/12/09 Bill Howe, UW 56VisTrails + GridFields
Example Query Results
3/12/09 Bill Howe, UW 57VisTrails + GridFields
Example Query: Climatology
Feb May
Average Surface Salinity by Month
Columbia River Plume 1999-2006
Columbia
River
psu
Washington
Oregon
animation
3/12/09 Bill Howe, UW 58VisTrails + GridFields
UW + Utah CluE Program
 Goals
 10+-year “climatologies” at interactive speeds
 …with provenance, reproducibility, collaboration …on a
shared-nothing, commodity platform
 In general: Explore the intersection of scientific
databases and scientific visualization, at scale
 Methods
 “Cloud-Enable” two projects

GridFields: Query algebra for mesh data

VisTrails: Scientific workflow and provenance
3/12/09 Bill Howe, UW 59VisTrails + GridFields
3/12/09 Bill Howe, UW 60VisTrails + GridFields
Converging Requirements
Vis: “Query-driven Visualization”
Vis: “In Situ Visualization”
Vis: “Remote Visualization”
DB: Millions of tuples per result
Vis DB
3/12/09 Bill Howe, UW 61VisTrails + GridFields
Preliminary results
 Managing Hadoop jobs with VisTrails
 GridField queries in Hadoop
 Core Visualization algorithms in Hadoop
3/12/09 Bill Howe, UW 62VisTrails + GridFields
Core Vis Algorithms in MapReduce
 Scalar/Volume Rendering
 Map: Rasterization
 Reduce: Compositing, blending
 Isosurface Extraction
 Map: Isosurface Extraction
 Reduce: Combine like isovalues
 Mesh Simplification
 Map: Bin vertices
 Reduce: Collapse binned triangles
3/12/09 Bill Howe, UW 63VisTrails + GridFields
ATLAS dataset
3/12/09 Bill Howe, UW 64VisTrails + GridFields
Rendering (not CluE)
# of mappers
57-node Nehalem
3/12/09 Bill Howe, UW 65VisTrails + GridFields
Isosurface Extraction (Preliminary)
32
48
64
96
128
3/12/09 Bill Howe, UW 66VisTrails + GridFields
“Query-Driven Visualization”
 Vis perspective:
 query = subsetting
 DB perspective:
 query = manipulation, preparation, restructuring, index-building,
aggregation, regridding, downsampling, simplification,
reformatting, etc.
Database Maxims:
1. Push the computation to the data.
2. Declarative programming is a good thing.
3/12/09 Bill Howe, UW 67VisTrails + GridFields
Why Cloud?
 “Cloud”?
 Software as a Service (SaaS)
 Infrastructure as a Service (IaaS)
 Platform as a Service (PaaS)
 Working definition:
General, elastic, data-intensive, scalable computing
This work: Vis techniques + DB techniques in the Cloud
3/12/09 Bill Howe, UW 68VisTrails + GridFields
Shared Nothing Parallel Databases
 Teradata
 Greenplum
 Netezza
 Aster Data Systems
 Datallegro
 Vertica
 MonetDB
Microsoft
Recently commercialized as “Vectorwise”
3/12/09 Bill Howe, UW 69VisTrails + GridFields
Taxonomy of Parallel Architectures
Easiest to program, but
$$$$
Scales to 1000s of nodes
3/12/09 Bill Howe, UW 70VisTrails + GridFieldsscreenshot: VisTrails, Claudio Silva, Juliana Freire, et al., University of Utah
VisTrails
3/12/09 Bill Howe, UW 71VisTrails + GridFieldsscreenshot: VisTrails, Claudio Silva, Juliana Freire, et al., University of Utah
Version Tree
3/12/09 Bill Howe, UW 72VisTrails + GridFields
Collaboration
Bill Howe @ UW
computes salt flux
using GridFields
Erik Anderson @ Utah
adds vector
streamlines and
adjusts opacity
Bill Howe @ UW adds
an isosurface of
salinity
Peter Lawson adds
discussion of the
scientific
interpretation
Howe et al., eScience 2008
3/12/09 Bill Howe, UW 73VisTrails + GridFields
Preliminary results
 Managing Hadoop jobs with VisTrails
 GridField queries in Hadoop
 Core Visualization algorithms in Hadoop
3/12/09 Bill Howe, UW 74VisTrails + GridFields
Preliminary results
 Managing Hadoop jobs with VisTrails
 GridField queries in Hadoop
 Core Visualization algorithms in Hadoop
3/12/09 Bill Howe, UW 75VisTrails + GridFields
Hadoop in VisTrails
 Wrap Hadoop Streaming/HDFS Operations
 Plug “PreProcess” to actual Vis Pipeline
3/12/09 75
3/12/09 Bill Howe, UW 76VisTrails + GridFields
Hadoop in VisTrails
 Provenance and Monitoring
3/12/09 76
3/12/09 Bill Howe, UW 77VisTrails + GridFields
Preliminary results
 Managing Hadoop jobs with VisTrails
 GridField queries in Hadoop
 Core Visualization algorithms in Hadoop
3/12/09 Bill Howe, UW 78VisTrails + GridFields
All Science is reducing to a database problem
Old model: “Query the world” (Data acquisition coupled to a specific hypothesis)
New model: “Download the world” (Data acquired en masse, independent of hypotheses)
 Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)
 Medicine: ubiquitous digital records, MRI, ultrasound
 Oceanography: high-resolution models, cheap sensors, satellites
 Biology: lab automation, high-throughput sequencing
“Increase Data Collection Exponentially in Less Time, with FlowCAM”
Empirical X  Analytical X  Computational X  X-informatics
3/12/09 Bill Howe, UW 79VisTrails + GridFields
Key Idea: Declarative Languages
SELECT *
FROM Order o, Item i
WHERE o.item = i.item
AND o.date = today()
join
select
scan scan
date = today()
o.item = i.item
Order oItem i
Find all orders from today, along with the items ordered
3/12/09 Bill Howe, UW 80VisTrails + GridFields
Example System: Teradata
AMP = unit of parallelism
3/12/09 Bill Howe, UW 81VisTrails + GridFields
Example System: Teradata
AMP 1 AMP 2 AMP 3
select
date=today()
select
date=today()
select
date=today()
scan
Order o
scan
Order o
scan
Order o
hash
h(item)
hash
h(item)
hash
h(item)
AMP 4 AMP 5 AMP 6
3/12/09 Bill Howe, UW 82VisTrails + GridFields
Example System: Teradata
AMP 1 AMP 2 AMP 3
scan
Item i
AMP 4 AMP 5 AMP 6
hash
h(item)
scan
Item i
hash
h(item)
scan
Item i
hash
h(item)
3/12/09 Bill Howe, UW 83VisTrails + GridFields
Example System: Teradata
AMP 4 AMP 5 AMP 6
join join join
o.item = i.item o.item = i.item o.item = i.item
contains all orders and all lines
where hash(item) = 1
contains all orders and all lines
where hash(item) = 2
contains all orders and all lines
where hash(item) = 3
3/12/09 Bill Howe, UW 84VisTrails + GridFields
Workflow Execution Plans
Need execution plans spanning client/server/cloud
3/12/09 Bill Howe, UW 85VisTrails + GridFields
Example: Isosurface Browsing
QuickTime™ and a
decompressor
are needed to see this picture.
3/12/09 Bill Howe, UW 86VisTrails + GridFields
Example: Isosurface Browsing
 Plan A
Subset Subset Subset Subset
tstep 0 tstep 1 tstep 2 tstep 3
3/12/09 Bill Howe, UW 87VisTrails + GridFields
Example: Isosurface Browsing
 Plan B: Build an index
Build Index, e.g., an Interval Tree (Cignoni 97)
Subset Subset Subset
tstep 0 tstep 1 tstep 2 tstep 3
Subset
Render
Isosurface Isosurface Isosurface Isosurface
Render Render Render
3/12/09 Bill Howe, UW 88VisTrails + GridFields
Example: Isosurface Browsing
 Plan C: Build a spatial index to support panning
 Plan D: Build a multi-resolution index to support zoom
 …and so on
 Why not precompute all appropriate indexes?
 Some will (partially) reside on client
 Storage is not as cheap as we pretend
 Need a flexible system where
 a “query result” can be explored interactively, and
 we prepare for similar queries
 similarity defined by natural “browsing patterns” in visualization
systems
3/12/09 Bill Howe, UW 89VisTrails + GridFields
3/12/09 Bill Howe, UW 90VisTrails + GridFields
Why MapReduce/Hadoop?
 Popular

AWS Elastic MapReduce

100s of startups

# of downloads

# of blog posts
 Free as in Speech
 Free as in Beer
 Flexible, Lightweight
 Scalable
 Fault-tolerant
3/12/09 Bill Howe, UW 91VisTrails + GridFields
Reducing Latency
 Online processing/progressive refinement
 Deliver approximate/partial results
 Standing Queries/Prepared plans
 Exploit indexes
Changes to Hadoop and/or other
tools required (e.g., Hbase)
3/12/09 Bill Howe, UW 92VisTrails + GridFields
Masking Latency
 Caching/materialized views
 Reuse old results
 Pre-fetching
 Stage and prepare new results
 Speculative processing
 Anticipate future results
No change to Hadoop required
3/12/09 Bill Howe, UW 93VisTrails + GridFields
source: Antonio Baptista, NSF CMOP STC
3/12/09 Bill Howe, UW 94VisTrails + GridFields
Why Visualization? (2)
north
channel
south
channel
3/12/09 Bill Howe, UW 95VisTrails + GridFields
MapReduce?
 Hadoop simplifies parallel data processing
 ++ scalability
 ++ fault tolerance
 ++ less programming
 -- latency is an issue
3/12/09 Bill Howe, UW 96VisTrails + GridFields
1 2 3 4 5 6 7
31
23
psu
8 9 10 11 12 13 14 15
16 17 18
(b)
19 20 21 22
24 25 26 27 28 29 30
Climatology Queries
3/12/09 Bill Howe, UW 97VisTrails + GridFields
3/12/09 Bill Howe, UW 98VisTrails + GridFields
As a GridField Expression
⊗
H0 : (x,y,b) V0 : (σ )
apply(0, z=(surf − b) * σ )
bind(0, surf)
C
H = Scan(contxt, "H")
rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H)
T = Scan(contxt, “T”)
V = Scan(contxt, “V”)
HxV = Cross(H, V)
HxVxT = Cross(HxV, T)
salt = Bind(contxt, HxVxT, “salt”)
onemonth = Regrid(salt, HxV, equijoin(“hpos,vpos”), avg())
3/12/09 Bill Howe, UW 99VisTrails + GridFields
As a SQL Query
Select hpos, vpos, avg(salt)
from ocean
group by hpos, vpos
3/12/09 Bill Howe, UW 100VisTrails + GridFields
Scientific Workflow Systems
 Value proposition: More time on science, less time on code
 How: By providing language features emphasizing sharing,
reuse, reproducibility, rapid prototyping, efficiency
 Provenance
 Visual programming
 Caching
 Integration with domain-specific tools
 Scheduling
3/12/09 Bill Howe, UW 101VisTrails + GridFields
Related Vis Work
 Parallel visualization systems
 ParaView, VisIt
 Query-Driven Visualization
 [Bethel et al 2006,2008,2009]
 FastBit Index
 [Shoshani et al 2007]
 DB Vis systems
 Tableau
3/12/09 Bill Howe, UW 102VisTrails + GridFields
Feeding the Pipeline
source: Ken Moreland
missing step?
3/12/09 Bill Howe, UW 103VisTrails + GridFields
Cannot Ignore “Preprocessing”
Hadoop
3/12/09 Bill Howe, UW 104VisTrails + GridFields
Role 2: Move Computation to the Data
“Transferring the whole data generated … to a storage device or a
visualization machine could become a serious bottleneck, because I/O
would take most of the … time. A more feasible approach is to reduce
and prepare the data in situ for subsequent visualization and data
analysis tasks.”
-- SciDAC Review
3/12/09 Bill Howe, UW 105VisTrails + GridFields
Remote Visualization
 Reduce and render remotely, transfer images
 ++ transfers less data
 -- specialized hardware, high load
 Reduce remotely, transfer data/geometry, render locally
 ++ uses local graphics pipeline
 -- transfers more data
3/12/09 Bill Howe, UW 106VisTrails + GridFields
3/12/09 Bill Howe, UW 107VisTrails + GridFields
Scientific Vis System Roundup
 General
 ParaView [KitWare, Los Alamos, Sandia]
 VisIt [LLNL]
 Specialized
 SALSA, particles, Quinn, UW
 VISUS, streaming/progressive, Jones, LLNL
 SAGE,
 Hyperwall, tiled display, NASA

Mais conteúdo relacionado

Mais procurados

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Alexandru Iosup
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebStefan Dietze
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data CenterGilles Fedak
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: EywaEugene Siow
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data ScienceFeyzi R. Bagirov
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable PapersJose Enrique Ruiz
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.Alexandru Iosup
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paperJose Enrique Ruiz
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesIan Mulvany
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsStefan Dietze
 
Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Nick Jankowski
 

Mais procurados (20)

eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
Big Data in the Cloud: Enabling the Fourth Paradigm by Matching SMEs with Dat...
 
Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Cifar
CifarCifar
Cifar
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: Eywa
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data Science
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable Papers
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.
 
Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)Making Small Data BIG (UT Austin, March 2016)
Making Small Data BIG (UT Austin, March 2016)
 
The Era of Open
The Era of OpenThe Era of Open
The Era of Open
 
Christine borgman keynote
Christine borgman keynoteChristine borgman keynote
Christine borgman keynote
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paper
 
A Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific CuriositiesA Cabinet Of Web2.0 Scientific Curiosities
A Cabinet Of Web2.0 Scientific Curiosities
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008Jankowski, Vks E Research Slidecast, 26 June2008
Jankowski, Vks E Research Slidecast, 26 June2008
 

Destaque

FANS Finding Auburn's New Students Presentation 2016
FANS Finding Auburn's New Students Presentation 2016FANS Finding Auburn's New Students Presentation 2016
FANS Finding Auburn's New Students Presentation 2016AuburnClubs
 
South Main Revitalization Plan
South Main Revitalization PlanSouth Main Revitalization Plan
South Main Revitalization PlanMax Herzog
 
Accounting Best Practices and Silent Auctions
Accounting Best Practices and Silent Auctions  Accounting Best Practices and Silent Auctions
Accounting Best Practices and Silent Auctions AuburnClubs
 
AXSES ArcRes-Specials Marketing
AXSES ArcRes-Specials MarketingAXSES ArcRes-Specials Marketing
AXSES ArcRes-Specials MarketingAXSES INC
 
Seminar bookingsdominica-final report
Seminar bookingsdominica-final reportSeminar bookingsdominica-final report
Seminar bookingsdominica-final reportAXSES INC
 
Year 4 curriculum evening 2016
Year 4 curriculum evening 2016Year 4 curriculum evening 2016
Year 4 curriculum evening 2016s52dmartindale
 
Primeiro Princípio da termodinâmica
Primeiro Princípio da termodinâmicaPrimeiro Princípio da termodinâmica
Primeiro Princípio da termodinâmicaLuiz Fabiano
 
Bioquimica cubana
Bioquimica cubanaBioquimica cubana
Bioquimica cubanaMoni Mora
 
John piper cuando no deseo a dios x eltropical
John piper cuando no deseo a dios x eltropicalJohn piper cuando no deseo a dios x eltropical
John piper cuando no deseo a dios x eltropicalbecemi
 

Destaque (12)

FANS Finding Auburn's New Students Presentation 2016
FANS Finding Auburn's New Students Presentation 2016FANS Finding Auburn's New Students Presentation 2016
FANS Finding Auburn's New Students Presentation 2016
 
South Main Revitalization Plan
South Main Revitalization PlanSouth Main Revitalization Plan
South Main Revitalization Plan
 
Accounting Best Practices and Silent Auctions
Accounting Best Practices and Silent Auctions  Accounting Best Practices and Silent Auctions
Accounting Best Practices and Silent Auctions
 
AXSES ArcRes-Specials Marketing
AXSES ArcRes-Specials MarketingAXSES ArcRes-Specials Marketing
AXSES ArcRes-Specials Marketing
 
Il cavaliere oscuro
Il cavaliere oscuroIl cavaliere oscuro
Il cavaliere oscuro
 
Seminar bookingsdominica-final report
Seminar bookingsdominica-final reportSeminar bookingsdominica-final report
Seminar bookingsdominica-final report
 
Year 4 curriculum evening 2016
Year 4 curriculum evening 2016Year 4 curriculum evening 2016
Year 4 curriculum evening 2016
 
Primeiro Princípio da termodinâmica
Primeiro Princípio da termodinâmicaPrimeiro Princípio da termodinâmica
Primeiro Princípio da termodinâmica
 
Agua
AguaAgua
Agua
 
Bioquimica cubana
Bioquimica cubanaBioquimica cubana
Bioquimica cubana
 
John piper cuando no deseo a dios x eltropical
John piper cuando no deseo a dios x eltropicalJohn piper cuando no deseo a dios x eltropical
John piper cuando no deseo a dios x eltropical
 
Quimica organica
Quimica organicaQuimica organica
Quimica organica
 

Semelhante a Visual Data Analytics in the Cloud for Exploratory Science

Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce University of Washington
 
Research Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisResearch Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisUniversity of Washington
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized databaseClaus Matzinger
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIAlluxio, Inc.
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXLuis Bermudez
 
Progress in semantic mapping - NKOS
Progress in semantic mapping - NKOSProgress in semantic mapping - NKOS
Progress in semantic mapping - NKOSAntoine Isaac
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...NETWAYS
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...NETWAYS
 
Azure: Lessons From The Field
Azure: Lessons From The FieldAzure: Lessons From The Field
Azure: Lessons From The FieldRob Gillen
 
Finding the Achilles Heel of the Web of Data
Finding the Achilles Heel of the Web of DataFinding the Achilles Heel of the Web of Data
Finding the Achilles Heel of the Web of DataChristophe Guéret
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 

Semelhante a Visual Data Analytics in the Cloud for Exploratory Science (20)

Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce
 
Research Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and AnalysisResearch Dataspaces: Pay-as-you-go Integration and Analysis
Research Dataspaces: Pay-as-you-go Integration and Analysis
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized database
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDX
 
Progress in semantic mapping - NKOS
Progress in semantic mapping - NKOSProgress in semantic mapping - NKOS
Progress in semantic mapping - NKOS
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
 
Azure: Lessons From The Field
Azure: Lessons From The FieldAzure: Lessons From The Field
Azure: Lessons From The Field
 
Finding the Achilles Heel of the Web of Data
Finding the Achilles Heel of the Web of DataFinding the Achilles Heel of the Web of Data
Finding the Achilles Heel of the Web of Data
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 

Mais de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersUniversity of Washington
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 

Mais de University of Washington (16)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 

Último

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Último (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Visual Data Analytics in the Cloud for Exploratory Science

  • 1. Visual Data Analytics in the Cloud for Exploratory Science Bill Howe, UW QuickTime™ and a decompressor are needed to see this picture. Huy Vo, Utah Claudio Silva, Utah Juliana Freire, Utah YingYi Bu, UW
  • 2. 3/12/09 Bill Howe, UW 2VisTrails + GridFields Data acquisition is no longer the bottleneck Old model: “Query the world” (Data acquisition coupled to a specific hypothesis) New model: “Download the world” (Data acquired en masse, in support of many hypotheses)  Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)  Oceanography: high-resolution models, cheap sensors, satellites  Biology: lab automation, high-throughput sequencing,
  • 3. 3/12/09 Bill Howe, UW 3VisTrails + GridFields Biology Oceanography Astronomy Two dimensions#ofbytes # of apps LSST SDSS Galaxy BioMart GEO IOOS OOI LANL HIVPathway Commons PanSTARRS
  • 4. 3/12/09 Bill Howe, UW 4VisTrails + GridFields This Talk  # of Bytes: MapReduce for Scientific Viz  # of Apps: Other VDA Projects
  • 5. 3/12/09 Bill Howe, UW 5VisTrails + GridFields Converging Requirements Vis DB
  • 6. 3/12/09 Bill Howe, UW 6VisTrails + GridFields Why Vis Needs DB “Transferring the whole data generated … to a storage device or a visualization machine could become a serious bottleneck, because I/O would take most of the … time. A more feasible approach is to reduce and prepare the data in situ for subsequent visualization and data analysis tasks.” -- SciDAC Review Current Research Topics in Vis: • “Query-driven Visualization” • “In Situ Visualization” • “Remote Visualization”
  • 7. 3/12/09 Bill Howe, UW 7VisTrails + GridFields Why DB Needs Vis
  • 8. 3/12/09 Bill Howe, UW 8VisTrails + GridFields Why DB Needs Vis (2) “What does the salt wedge look like?”
  • 9. 3/12/09 Bill Howe, UW 9VisTrails + GridFields Thesis  We can no longer afford to build separate visualization and data management systems  Data is increasingly destined for the cloud  First Attack: Implement Vis primitives in an existing “cloud” DM system
  • 10. 3/12/09 Bill Howe, UW 10VisTrails + GridFields Core Vis Algorithms in MapReduce  Scalar/Volume Rendering  Isosurface Extraction  Mesh Simplification
  • 11. 3/12/09 Bill Howe, UW 11VisTrails + GridFields Some distributed algorithm… Map (Shuffle) Reduce
  • 12. 3/12/09 Bill Howe, UW 12VisTrails + GridFields CluE Cluster  410 nodes  Dual Intel Xeon 2.8GHz, hyperthreading  8GB main memory each  Hadoop, no access to OS  Google provided, IBM maintaine, NSF funded
  • 13. 3/12/09 Bill Howe, UW 13VisTrails + GridFields CluE Cluster Scaling
  • 14. 3/12/09 Bill Howe, UW 14VisTrails + GridFields Isosurface Example
  • 15. 3/12/09 Bill Howe, UW 15VisTrails + GridFields Isosurface Example
  • 16. 3/12/09 Bill Howe, UW 16VisTrails + GridFields Isosurface Example
  • 17. 3/12/09 Bill Howe, UW 17VisTrails + GridFields Isosurface Example
  • 18. 3/12/09 Bill Howe, UW 18VisTrails + GridFields Isosurface Extraction
  • 19. 3/12/09 Bill Howe, UW 19VisTrails + GridFields Isosurface Extraction
  • 20. 3/12/09 Bill Howe, UW 20VisTrails + GridFields Isosurface Results O(N2 )O(N)
  • 21. 3/12/09 Bill Howe, UW 21VisTrails + GridFields Scalable Rendering
  • 22. 3/12/09 Bill Howe, UW 22VisTrails + GridFields Scalable Rendering  Left: Atlas  18GB  500M triangles  Right: St. Matthew  13GB  372M triangles  Laser Scans, Digital Michelandgelo project srrc: Digital Michelangelo project
  • 23. 3/12/09 Bill Howe, UW 23VisTrails + GridFields Rendering Results
  • 24. 3/12/09 Bill Howe, UW 24VisTrails + GridFields Roadmap  # of Bytes: MapReduce for Scientific Viz  # of Apps: Other VDA projects  Azure Ocean  SQLShare  Automating Mashups
  • 25. 3/12/09 Bill Howe, UW 25VisTrails + GridFields [John Delaney, University of Washington]
  • 26. 3/12/09 Bill Howe, UW 26VisTrails + GridFields Azure OceanAzure Ocean COVE for Visualization Trident for Processing Azure for Data+ +
  • 27. 3/12/09 Bill Howe, UW 27VisTrails + GridFields SQLShare: Query Services for Ad Hoc Research Data
  • 28. 3/12/09 Bill Howe, UW 28VisTrails + GridFields Ad Hoc Research Data 5/18/10 Garret Cole, eScience Institute Fasta format Spread sheets Tabular data
  • 29. 3/12/09 Bill Howe, UW 29VisTrails + GridFields5/18/10 Garret Cole, eScience Institute Problem “I spend 90% of my time handling data rather than doing science” -- Robin Kodner, Postdoc, Armbrust Lab
  • 30. 3/12/09 Bill Howe, UW 30VisTrails + GridFields An observation about “handling data”  How often does each RNA hit appear inside my annotated surface group?  SELECT hit, COUNT(*) as cnt FROM tigrfamannotation_surface GROUP BY hit ORDER BY cnt DESC 5/18/10 Garret Cole, eScience Institute
  • 31. 3/12/09 Bill Howe, UW 31VisTrails + GridFields 31 Discovery: SQL Does not Terrify Scientists 5/18/10 Garret Cole, eScience Institute
  • 32. 3/12/09 Bill Howe, UW 32VisTrails + GridFields
  • 33. 3/12/09 Bill Howe, UW 33VisTrails + GridFields5/18/10 Garret Cole, eScience Institute Technology used in 1st Gen Component Stack
  • 34. 3/12/09 Bill Howe, UW 34VisTrails + GridFields SQLShare Redux  Conventional wisdom says “Scientists won’t write SQL”  We don’t believe it!  Instead, we implicate difficulty in  installation  configuration  schema design  performance tuning  data ingest  over-reliance on GUIs  Critical need for visualization  Clear role for Tableau! We are asking “What kind of platform will make SQL useful for scientific inquiry?”
  • 35. 3/12/09 Bill Howe, UW 35VisTrails + GridFields Automating Mashups
  • 36. 3/12/09 Bill Howe, UW 36VisTrails + GridFields Why Mashups?  Jim Gray: # of datasets scales as N2  Each pairwise comparison generates a new dataset  Corollary: # of apps scales as N2  Every pairwise comparison motivates a new mashup  To keep up, we need to  entrain new programmers,  make existing programmers more productive,  or both
  • 37. 3/12/09 Bill Howe, UW 37VisTrails + GridFields Satellite Images + Crime Incidence Reports
  • 38. 3/12/09 Bill Howe, UW 38VisTrails + GridFields Twitter Feed + Flickr Stream
  • 39. 3/12/09 Bill Howe, UW 39VisTrails + GridFields Why Mashups?  The time of one’s data fitting into a 15 page research paper is past.  Datasets are too large and complex to be conveyed with a handful of static images  Prediction: succinct, targeted, interactive web apps will become the currency of scientific communication  with the public  with policy makers  with colleagues in other disciplines  with peers  with students (K12 - grad)
  • 40. 3/12/09 Bill Howe, UW 40VisTrails + GridFields Tableau Mashups
  • 41. 3/12/09 Bill Howe, UW 41VisTrails + GridFields Conclusions  Converging requirements for DB and Vis  At high scale:  A Vis library in MapReduce  At high complexity:  Azure Ocean  Data + Workflow + Vis  “Client + Cloud”,“Computational mobility”  SQLShare  Ad Hoc data -- “anything goes”  Visualization critical  (semi-)automated mashups  “Show me what’s interesting”
  • 42. 3/12/09 Bill Howe, UW 42VisTrails + GridFields Acknowledgments http://escience.washington.edu
  • 43. 3/12/09 Bill Howe, UW 43VisTrails + GridFields BACKUP SLIDES
  • 44. 3/12/09 Bill Howe, UW 44VisTrails + GridFields [John Delaney, University of Washington]
  • 45. 3/12/09 Bill Howe, UW 45VisTrails + GridFields
  • 46. 3/12/09 Bill Howe, UW 46VisTrails + GridFields John Delaney
  • 47. 3/12/09 Bill Howe, UW 47VisTrails + GridFields Azure OceanAzure Ocean COVE for Visualization Trident for Processing Azure for Data+ +
  • 48. COVECOVE  Research into new interfaces for cross-disciplinary ocean scienceResearch into new interfaces for cross-disciplinary ocean science  Extensive instrument and cable layout for creating experimentsExtensive instrument and cable layout for creating experiments  Flexible terrain and image engine for visualizing siteFlexible terrain and image engine for visualizing site  True 3D/4D science dataset visualizationTrue 3D/4D science dataset visualization  Field tested in RSN observatory layout and on ocean expeditionsField tested in RSN observatory layout and on ocean expeditions  Cross platform and extensible with python and workflow systemsCross platform and extensible with python and workflow systems
  • 49. 3/12/09 Bill Howe, UW 49VisTrails + GridFields TridentTrident  Microsoft Research scientific workflow systemMicrosoft Research scientific workflow system  Visual programming environment for connecting tasksVisual programming environment for connecting tasks  Science-specific task libraries including one for ocean sciencesScience-specific task libraries including one for ocean sciences  Automated provenance capture, monitoring, and fault toleranceAutomated provenance capture, monitoring, and fault tolerance  Runs on local system, Windows server, or HPC ClusterRuns on local system, Windows server, or HPC Cluster  Cross platform with Silverlight and web service interfaceCross platform with Silverlight and web service interface
  • 50. 3/12/09 Bill Howe, UW 50VisTrails + GridFields AzureAzure  Microsoft’s cloud computing platformMicrosoft’s cloud computing platform  Provides storage and computing as pay-as-you-go servicesProvides storage and computing as pay-as-you-go services  From development standpoint, system looks like provisioned VM’sFrom development standpoint, system looks like provisioned VM’s  SQL, table, and blob (file system) storage models are includedSQL, table, and blob (file system) storage models are included  Access to storage via RESTful HTTP interfaceAccess to storage via RESTful HTTP interface
  • 51. 3/12/09 Bill Howe, UW 51VisTrails + GridFields Azure OceanAzure Ocean  COVE + Trident + Azure provides visual analytics to scientistsCOVE + Trident + Azure provides visual analytics to scientists  Any component –Any component – VisualizationVisualization,, ComputingComputing, or, or DataData –– can becan be provisioned locally, on a server, or in the cloudprovisioned locally, on a server, or in the cloud  When on same machine, system APIs are leveraged for speedWhen on same machine, system APIs are leveraged for speed  When distributed, communication is through HTTP and RESTful APIsWhen distributed, communication is through HTTP and RESTful APIs  Flexible platform for the diverse ocean science needsFlexible platform for the diverse ocean science needs
  • 52. 3/12/09 Bill Howe, UW 52VisTrails + GridFields
  • 53. 3/12/09 Bill Howe, UW 53VisTrails + GridFields MapReduce Programming Model  Input & Output: each a set of key/value pairs  Programmer specifies two functions:  Processes input key/value pair  Produces set of intermediate pairs  Combines all intermediate values for a particular key  Produces a set of merged output values (usually just one) map (in_key, in_value) -> list(out_key, intermediate_value) reduce (out_key, list(intermediate_value)) -> list(out_value) slide source: Google, Inc.
  • 54. 3/12/09 Bill Howe, UW 54VisTrails + GridFields Isosurface Example
  • 55. 3/12/09 Bill Howe, UW 55VisTrails + GridFields Isosurface Example <Vis movie>QuickTime™ and a decompressor are needed to see this picture. Key idea: Zooplankton correlated with temperature
  • 56. 3/12/09 Bill Howe, UW 56VisTrails + GridFields Example Query Results
  • 57. 3/12/09 Bill Howe, UW 57VisTrails + GridFields Example Query: Climatology Feb May Average Surface Salinity by Month Columbia River Plume 1999-2006 Columbia River psu Washington Oregon animation
  • 58. 3/12/09 Bill Howe, UW 58VisTrails + GridFields UW + Utah CluE Program  Goals  10+-year “climatologies” at interactive speeds  …with provenance, reproducibility, collaboration …on a shared-nothing, commodity platform  In general: Explore the intersection of scientific databases and scientific visualization, at scale  Methods  “Cloud-Enable” two projects  GridFields: Query algebra for mesh data  VisTrails: Scientific workflow and provenance
  • 59. 3/12/09 Bill Howe, UW 59VisTrails + GridFields
  • 60. 3/12/09 Bill Howe, UW 60VisTrails + GridFields Converging Requirements Vis: “Query-driven Visualization” Vis: “In Situ Visualization” Vis: “Remote Visualization” DB: Millions of tuples per result Vis DB
  • 61. 3/12/09 Bill Howe, UW 61VisTrails + GridFields Preliminary results  Managing Hadoop jobs with VisTrails  GridField queries in Hadoop  Core Visualization algorithms in Hadoop
  • 62. 3/12/09 Bill Howe, UW 62VisTrails + GridFields Core Vis Algorithms in MapReduce  Scalar/Volume Rendering  Map: Rasterization  Reduce: Compositing, blending  Isosurface Extraction  Map: Isosurface Extraction  Reduce: Combine like isovalues  Mesh Simplification  Map: Bin vertices  Reduce: Collapse binned triangles
  • 63. 3/12/09 Bill Howe, UW 63VisTrails + GridFields ATLAS dataset
  • 64. 3/12/09 Bill Howe, UW 64VisTrails + GridFields Rendering (not CluE) # of mappers 57-node Nehalem
  • 65. 3/12/09 Bill Howe, UW 65VisTrails + GridFields Isosurface Extraction (Preliminary) 32 48 64 96 128
  • 66. 3/12/09 Bill Howe, UW 66VisTrails + GridFields “Query-Driven Visualization”  Vis perspective:  query = subsetting  DB perspective:  query = manipulation, preparation, restructuring, index-building, aggregation, regridding, downsampling, simplification, reformatting, etc. Database Maxims: 1. Push the computation to the data. 2. Declarative programming is a good thing.
  • 67. 3/12/09 Bill Howe, UW 67VisTrails + GridFields Why Cloud?  “Cloud”?  Software as a Service (SaaS)  Infrastructure as a Service (IaaS)  Platform as a Service (PaaS)  Working definition: General, elastic, data-intensive, scalable computing This work: Vis techniques + DB techniques in the Cloud
  • 68. 3/12/09 Bill Howe, UW 68VisTrails + GridFields Shared Nothing Parallel Databases  Teradata  Greenplum  Netezza  Aster Data Systems  Datallegro  Vertica  MonetDB Microsoft Recently commercialized as “Vectorwise”
  • 69. 3/12/09 Bill Howe, UW 69VisTrails + GridFields Taxonomy of Parallel Architectures Easiest to program, but $$$$ Scales to 1000s of nodes
  • 70. 3/12/09 Bill Howe, UW 70VisTrails + GridFieldsscreenshot: VisTrails, Claudio Silva, Juliana Freire, et al., University of Utah VisTrails
  • 71. 3/12/09 Bill Howe, UW 71VisTrails + GridFieldsscreenshot: VisTrails, Claudio Silva, Juliana Freire, et al., University of Utah Version Tree
  • 72. 3/12/09 Bill Howe, UW 72VisTrails + GridFields Collaboration Bill Howe @ UW computes salt flux using GridFields Erik Anderson @ Utah adds vector streamlines and adjusts opacity Bill Howe @ UW adds an isosurface of salinity Peter Lawson adds discussion of the scientific interpretation Howe et al., eScience 2008
  • 73. 3/12/09 Bill Howe, UW 73VisTrails + GridFields Preliminary results  Managing Hadoop jobs with VisTrails  GridField queries in Hadoop  Core Visualization algorithms in Hadoop
  • 74. 3/12/09 Bill Howe, UW 74VisTrails + GridFields Preliminary results  Managing Hadoop jobs with VisTrails  GridField queries in Hadoop  Core Visualization algorithms in Hadoop
  • 75. 3/12/09 Bill Howe, UW 75VisTrails + GridFields Hadoop in VisTrails  Wrap Hadoop Streaming/HDFS Operations  Plug “PreProcess” to actual Vis Pipeline 3/12/09 75
  • 76. 3/12/09 Bill Howe, UW 76VisTrails + GridFields Hadoop in VisTrails  Provenance and Monitoring 3/12/09 76
  • 77. 3/12/09 Bill Howe, UW 77VisTrails + GridFields Preliminary results  Managing Hadoop jobs with VisTrails  GridField queries in Hadoop  Core Visualization algorithms in Hadoop
  • 78. 3/12/09 Bill Howe, UW 78VisTrails + GridFields All Science is reducing to a database problem Old model: “Query the world” (Data acquisition coupled to a specific hypothesis) New model: “Download the world” (Data acquired en masse, independent of hypotheses)  Astronomy: High-resolution, high-frequency sky surveys (SDSS, LSST, PanSTARRS)  Medicine: ubiquitous digital records, MRI, ultrasound  Oceanography: high-resolution models, cheap sensors, satellites  Biology: lab automation, high-throughput sequencing “Increase Data Collection Exponentially in Less Time, with FlowCAM” Empirical X  Analytical X  Computational X  X-informatics
  • 79. 3/12/09 Bill Howe, UW 79VisTrails + GridFields Key Idea: Declarative Languages SELECT * FROM Order o, Item i WHERE o.item = i.item AND o.date = today() join select scan scan date = today() o.item = i.item Order oItem i Find all orders from today, along with the items ordered
  • 80. 3/12/09 Bill Howe, UW 80VisTrails + GridFields Example System: Teradata AMP = unit of parallelism
  • 81. 3/12/09 Bill Howe, UW 81VisTrails + GridFields Example System: Teradata AMP 1 AMP 2 AMP 3 select date=today() select date=today() select date=today() scan Order o scan Order o scan Order o hash h(item) hash h(item) hash h(item) AMP 4 AMP 5 AMP 6
  • 82. 3/12/09 Bill Howe, UW 82VisTrails + GridFields Example System: Teradata AMP 1 AMP 2 AMP 3 scan Item i AMP 4 AMP 5 AMP 6 hash h(item) scan Item i hash h(item) scan Item i hash h(item)
  • 83. 3/12/09 Bill Howe, UW 83VisTrails + GridFields Example System: Teradata AMP 4 AMP 5 AMP 6 join join join o.item = i.item o.item = i.item o.item = i.item contains all orders and all lines where hash(item) = 1 contains all orders and all lines where hash(item) = 2 contains all orders and all lines where hash(item) = 3
  • 84. 3/12/09 Bill Howe, UW 84VisTrails + GridFields Workflow Execution Plans Need execution plans spanning client/server/cloud
  • 85. 3/12/09 Bill Howe, UW 85VisTrails + GridFields Example: Isosurface Browsing QuickTime™ and a decompressor are needed to see this picture.
  • 86. 3/12/09 Bill Howe, UW 86VisTrails + GridFields Example: Isosurface Browsing  Plan A Subset Subset Subset Subset tstep 0 tstep 1 tstep 2 tstep 3
  • 87. 3/12/09 Bill Howe, UW 87VisTrails + GridFields Example: Isosurface Browsing  Plan B: Build an index Build Index, e.g., an Interval Tree (Cignoni 97) Subset Subset Subset tstep 0 tstep 1 tstep 2 tstep 3 Subset Render Isosurface Isosurface Isosurface Isosurface Render Render Render
  • 88. 3/12/09 Bill Howe, UW 88VisTrails + GridFields Example: Isosurface Browsing  Plan C: Build a spatial index to support panning  Plan D: Build a multi-resolution index to support zoom  …and so on  Why not precompute all appropriate indexes?  Some will (partially) reside on client  Storage is not as cheap as we pretend  Need a flexible system where  a “query result” can be explored interactively, and  we prepare for similar queries  similarity defined by natural “browsing patterns” in visualization systems
  • 89. 3/12/09 Bill Howe, UW 89VisTrails + GridFields
  • 90. 3/12/09 Bill Howe, UW 90VisTrails + GridFields Why MapReduce/Hadoop?  Popular  AWS Elastic MapReduce  100s of startups  # of downloads  # of blog posts  Free as in Speech  Free as in Beer  Flexible, Lightweight  Scalable  Fault-tolerant
  • 91. 3/12/09 Bill Howe, UW 91VisTrails + GridFields Reducing Latency  Online processing/progressive refinement  Deliver approximate/partial results  Standing Queries/Prepared plans  Exploit indexes Changes to Hadoop and/or other tools required (e.g., Hbase)
  • 92. 3/12/09 Bill Howe, UW 92VisTrails + GridFields Masking Latency  Caching/materialized views  Reuse old results  Pre-fetching  Stage and prepare new results  Speculative processing  Anticipate future results No change to Hadoop required
  • 93. 3/12/09 Bill Howe, UW 93VisTrails + GridFields source: Antonio Baptista, NSF CMOP STC
  • 94. 3/12/09 Bill Howe, UW 94VisTrails + GridFields Why Visualization? (2) north channel south channel
  • 95. 3/12/09 Bill Howe, UW 95VisTrails + GridFields MapReduce?  Hadoop simplifies parallel data processing  ++ scalability  ++ fault tolerance  ++ less programming  -- latency is an issue
  • 96. 3/12/09 Bill Howe, UW 96VisTrails + GridFields 1 2 3 4 5 6 7 31 23 psu 8 9 10 11 12 13 14 15 16 17 18 (b) 19 20 21 22 24 25 26 27 28 29 30 Climatology Queries
  • 97. 3/12/09 Bill Howe, UW 97VisTrails + GridFields
  • 98. 3/12/09 Bill Howe, UW 98VisTrails + GridFields As a GridField Expression ⊗ H0 : (x,y,b) V0 : (σ ) apply(0, z=(surf − b) * σ ) bind(0, surf) C H = Scan(contxt, "H") rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H) T = Scan(contxt, “T”) V = Scan(contxt, “V”) HxV = Cross(H, V) HxVxT = Cross(HxV, T) salt = Bind(contxt, HxVxT, “salt”) onemonth = Regrid(salt, HxV, equijoin(“hpos,vpos”), avg())
  • 99. 3/12/09 Bill Howe, UW 99VisTrails + GridFields As a SQL Query Select hpos, vpos, avg(salt) from ocean group by hpos, vpos
  • 100. 3/12/09 Bill Howe, UW 100VisTrails + GridFields Scientific Workflow Systems  Value proposition: More time on science, less time on code  How: By providing language features emphasizing sharing, reuse, reproducibility, rapid prototyping, efficiency  Provenance  Visual programming  Caching  Integration with domain-specific tools  Scheduling
  • 101. 3/12/09 Bill Howe, UW 101VisTrails + GridFields Related Vis Work  Parallel visualization systems  ParaView, VisIt  Query-Driven Visualization  [Bethel et al 2006,2008,2009]  FastBit Index  [Shoshani et al 2007]  DB Vis systems  Tableau
  • 102. 3/12/09 Bill Howe, UW 102VisTrails + GridFields Feeding the Pipeline source: Ken Moreland missing step?
  • 103. 3/12/09 Bill Howe, UW 103VisTrails + GridFields Cannot Ignore “Preprocessing” Hadoop
  • 104. 3/12/09 Bill Howe, UW 104VisTrails + GridFields Role 2: Move Computation to the Data “Transferring the whole data generated … to a storage device or a visualization machine could become a serious bottleneck, because I/O would take most of the … time. A more feasible approach is to reduce and prepare the data in situ for subsequent visualization and data analysis tasks.” -- SciDAC Review
  • 105. 3/12/09 Bill Howe, UW 105VisTrails + GridFields Remote Visualization  Reduce and render remotely, transfer images  ++ transfers less data  -- specialized hardware, high load  Reduce remotely, transfer data/geometry, render locally  ++ uses local graphics pipeline  -- transfers more data
  • 106. 3/12/09 Bill Howe, UW 106VisTrails + GridFields
  • 107. 3/12/09 Bill Howe, UW 107VisTrails + GridFields Scientific Vis System Roundup  General  ParaView [KitWare, Los Alamos, Sandia]  VisIt [LLNL]  Specialized  SALSA, particles, Quinn, UW  VISUS, streaming/progressive, Jones, LLNL  SAGE,  Hyperwall, tiled display, NASA

Notas do Editor

  1. Drowning in data; starving for information We’re at war with these engineering companies. FlowCAM is bragging about the amount of data they can spray out of their device. How to use this enormous data stream to answer scientific questions is someone else’s problem. “Typical large pharmas today are generating 20 terabytes of data daily. That’s probably going up to 100 terabytes per day in the next year or so.” “tens of terabytes of data per day” -- genome center at Washignton University Increase data collection exponentially with flowcam
  2. Vertical: Fewer, bigger apps Horizontal: Many, smaller apps Limiting Resource: Effort = Napps * Nbytes
  3. Analytics and Visualization are mutually dependent Scalability Fault-tolerance Exploit shared-nothing, commodity clusters In general: Move computation to the data Data is ending up in the cloud; we need to figure out how to use it.
  4. Visualization is a more efficient way to query data -- you can browse and explore. But you need to be able to switch back and forth between interactive browsing and symbolic querying
  5. What exactly is Ad Hoc Research data? It is data that can come in any size shape or form, where the data is heterogeneous within its structure, format, quality, and more.
  6. (granted we had a minute for Bill (clearly Bill) to describe this new eScience movement) We want to give a little background of our project before we launch into it, so we will discuss the problem we are trying to solve. Essentially, we want to remove the speed-bump of data handling from the scientists.
  7. To begin, we ask, what kind of questions would you ask your data once you have it ready to be worked on? Just about EVERY question that we have heard a scientist would ask, we have found an equivalent SQL statement counterpart. If we could just turn their questions in SQL our job would be done, but there are many other problems to solve before that becomes a reality. For example, their data may not reside in a relational database. This brings us to part of our next problem: how can we bring the power of SQL to the scientists to solve their questions without the overhead of everything that a database administrator would need to do.
  8. One claim we are trying to prove with this project is that scientists are not afraid to learn a bit of SQL
  9. In our first generation deployment, we used the asp.net front end on the windows azure cloud to host our web service and Amazon’s ec2 cloud as the backend to host our Microsoft SQL Server database.
  10. Data products are the currency of scientific and statistical communication with the public Ex: Obama map Ex: Mars Rover pictures generate 218M hits in 24 hrs But: Datasets are growing too big and too complex to view through a few static images Scientists want to create interactive visualizations that allow others to explore their results Ex: Nasa 3D with Photosynth Ex: CAMERA Ex:
  11. On the order of hundreds of points. Manual browsing.
  12. Ex: Nasa 3D with Photosynth Ex: CAMERA Ex:
  13. Data-intensive science
  14. This movie was rendered offline, but it’s increasingly important to be able to create visualizations on the fly to allow interactive exploration of large datasets.
  15. Need to consider private clouds Not just renting hardware: general-purpose data processing
  16. The goal here is to make Shared Nothing Architecturs easier to program.
  17. We only wrap the interface for Hadoop Streaming in VisTrails with the additional suppport of HDFS operations to upload/download data/libraries for the job. The Hadoop Streaming is plugged into a local VTK rendering pipeline that would grab data from the cloud and generate an animation on the VisTrails Spreadsheet. Users can specify their own Python Source as mapper/reducer. In this case, a VTK script is specified in the mapper. Also, VTK libraries are shipped along with the code to the computing node. This uses the underlying –cacheArchive of Hadoop streaming.
  18. By default, Hadoop logs are output to the standard output of VisTrails app. Jobs are killed by terminate the program and run an extra command returned by Hadoop. However, one can plug a HadoopTrackerCell to the end of the pipeline to have their log messages to be monitored on the VisTrails Spreadsheet. There are also button to kill the job or show Job Tracker, which would automatically connect through the CLuE’s specific proxy to see additional logs/error messages of jobs.
  19. Drowning in data; starving for information We’re at war with these engineering companies. FlowCAM is bragging about the amount of data they can spray out of their device. How to use this enormous data stream to answer scientific questions is someone else’s problem.
  20. Need to assign workflows to resources for execution in a heterogeneous compute environment. Parts of this workflow can be compiled into Hadoop jobs, parts should be run locally so that they exploit hardware acceleration. But this is not just computation placement -- there are different execution plans, similar to relational execution plans. Gridfields expressions can be algebraically optimized, for example.
  21. Plan C: Build a spatial index to support panning Plan D: Build a multi-resolution index to support zoom …and so on Why not precompute all appropriate indexes? Some will (partially) reside on client Storage is not as cheap as we pretend Need a flexible system where a “query result” can be explored interactively, and we prepare for similar queries similarity defined by natural “browsing patterns” in visualization systems
  22. We can’t just precompute the indexes, since they may reside on
  23. Analytics and Visualization are mutually dependent Scalability Fault-tolerance Exploit shared-nothing, commodity clusters In general: Move computation to the data
  24. Upper left: Average
  25. Sweeping through the velocity fields quickly exposed the location of the “upstream” salt flux -- where salty water made its way back upstream.