SlideShare uma empresa Scribd logo
1 de 47
Bill Howe
UW eScience Institute
Computer Science & Engineering
University of Washington
Big Ocean Data: Numerically
Conservative Parallel Query Processing
3/18/2022 Bill Howe, UW 1
Scott Moe
Applied Math
University of Washington
3/18/2022 Bill Howe, UW 2
http://escience.washington.edu
3/18/2022 Bill Howe, UW 3
eScience Scalable Data Analytics Group
Bill Howe, Phd (databases, cloud, data-intensive scalable computing, visualization)
Postdocs (past and present)
– Dan Halperin, postdoc (distributed systems, algorithms)
– Seung-Hee Bae, postdoc (scalable machine learning algorithms)
– (alumni) Marianne Shaw, Phd (hadoop, graph query, health informatics)
Staff (past and present)
– Sagar Chitnis (cloud computing, databases, web services)
– (alumni) Alicia Key (web applications, visualization) (alumni)
– (alumni) Garret Cole (databases, cloud computing, web services) (alumni)
Students
– Scott Moe (2nd yr Phd, Applied Math)
Partners
– CSE DB Faculty: Magda Balazinska, Dan Suciu
– CSE students: Paris Koutris, Prasang Upadhyaya, Shengliang Xu, Jinjing Wang
– UW-IT (web applications, QA/support)
– Cecilia Aragon, Phd, Associate Professor, HCDE (visualization, scientific applications)
3/18/2022 Bill Howe, UW 4
Summary of Big Data in the Ocean Sciences
• In transition from expedition-based to observatory-based science
• Enormous investments in infrastructure for linking sensors
• Ad hoc, on-demand integration of large, heterogeneous,
distributed datasets is the universal requirement in this regime
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Summary of our work
• We have the beginnings of a “universal regridding” operator with
nice algebraic properties
• We’re using it to implement efficient distributed data sharing
applications, parallel algorithms, and more
3/18/2022 Bill Howe, UW 5
slide: John Delaney, UW
Regional Scale Nodes
3/18/2022 Bill Howe, UW 7
John
Delaney
10s of Gigabits/second from the ocean floor
Cyberinfrastructure
3/18/2022 Bill Howe, UW 8
• “Integration of numerical ocean models and their output as derived data products”
• “Generation and distribution of qualified science data products in (near) real time”
• “Access to, syntactic transformation, semantic mediation, analysis, and visualization of
science data, data products, and derived data products”
Matthew
Arrot
3/18/2022 Bill Howe, UW 9
17 federal organizations named as partners
11 Regional Associations
“a strategy for incorporating observation systems from …
near shore waters as part of … a network of observatories.”
Center for Coastal Margin
Observation and Prediction (CMOP)
3/18/2022 Bill Howe, UW 10
Antonio
Baptista
Virtual Mekong Basin
3/18/2022 Bill Howe, UW 11
img src: Mark Stoermer, UW Center for Environmental Visualization
Jeff
Richey
3/18/2022 Bill Howe, UW 12
#
of
bytes
# of data sources
telescopes
spectra
LSST (~100PB; images, spectra)
PanSTARRS (~40PB; images, trajectories)
OOI (~50TB/year; sims, RSN)
IOOS (~50TB/year; sims, satellite, gliders,
AUVs, vessels, more)
CMOP (~10TB/year; sims, stations, gliders,
AUVs, vessels, more)
SDSS (~400TB; images, spectra, catalogs)
n-body
sims
models
AUVs
stations
cruises, CTDs
flow cytometry
gliders
ADCP
satellites
Astronomy
Ocean Sciences
3 V’s of Big Data
Volume
Variety
Velocity
SQLShare: Ad Hoc Databases for Science
Problem
– Data is captured and manipulated in ad hoc files
– Made sense five years ago; the data volumes were
manageable
– But now: 50k rows, 100s of files, sharing
Why not put everything into a database?
– A huge amount of up-front effort
– Hard to design for a moving target
– Running a database system is a huge drain
Solution: SQLShare
– Upload data through your browser: no setup, no installation
– Login to browse science questions in English
– Click a question, see the SQL to answer it question
– Edit the SQL to answer an "adjacent" question, even if you
wouldn’t know how to write it from scratch
3/18/2022 13
https://sqlshare.escience.washington.edu/
Summary of Big Data in the Ocean Sciences
• Transition from expedition-based to observatory-based science
• Enormous investments in integrating infrastructure for data
acquisition and interoperability
• Ad hoc, on-demand integration of large, heterogeneous,
distributed datasets is the universal requirement in this regime
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Summary of our work
• We have the beginnings of a “universal regridding” operator with
nice algebraic properties
• We’re using it to implement efficient distributed data sharing
applications, parallel algorithms, and more
3/18/2022 Bill Howe, UW 14
Status Quo
• “FTP + MATLAB”
• “Nascent Databases”
– File-based, format-specific API
– UniData’s NetCDF, HDF5
– Some IO optimization, some indexing
• Data Servers
– Same as file-based systems,
– but supports RPC
3/18/2022 Bill Howe, UW 15
Hyrax
None of this scales
- up with data volumes
- up with number of sources
- down with developer expertise
What do we really need here?
• Claim: Everything reduces to regridding
• Model-data comparisons skill assessment?
Regrid observations onto model mesh
• Model-model comparison?
Regrid one model’s mesh onto the other’s
• Model coupling?
Regrid a meso-scale atmospheric model onto your regional ocean model
• Visualization?
Regrid onto a 3D mesh, or regrid onto a 2D array of pixels
3/18/2022 Bill Howe, UW 16
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 17
3/18/2022 Bill Howe, UW 18
Washington
Oregon
Columbia River Estuary
Washington
Oregon
Columbia River Estuary
SciDB
Hyrax
GridFields
ESMF
VTK/Paraview
Structured grids are easy
3/18/2022 Bill Howe, eScience Institute 21
 The data model
(Cartesian products of coordinate variables)
 immediately implies a representation,
(multidimensional arrays)
 an API,
(reading and writing subslabs)
 and an efficient implementation
(address calculation using array “shape”)
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 22
Naïve Method: Interpolation (Spatial Join)
3/18/2022 Bill Howe, UW 23
For each vertex in the target grid,
Find containing cell in the source grid,
Evaluate the basis functions to interpolate
3/18/2022 Bill Howe, UW 24
Supermeshing [Farrell 10]
3/18/2022 Bill Howe, UW 25
For each cell in the target grid,
Find overlapping cells in the source grid,
Compute their intersections
Derive new coefficients to minimize L2 norm
* Guaranteeed Conservative
* Minimizes Error
But:
Domains must match exactly
3/18/2022 Bill Howe, UW 26
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 27
Pre-Relational: if your data changed, your application broke.
Early RDBMS were buggy and slow (and often reviled), but
required only 5% of the application code.
“Activities of users at terminals and most application programs should
remain unaffected when the internal representation of data is changed and
even when some aspects of the external representation are changed.”
Key Idea: Programs that manipulate tabular data exhibit an algebraic
structure allowing reasoning and manipulation independently of physical
data representation
Digression: Relational Database History
-- Codd 1979
Key Idea: An Algebra of Tables
select
project
join join
Other operators: aggregate, union, difference, cross product
30
Review: Algebraic Optimization
N = ((4*2)+((4*3)+0))/1
Algebraic Laws:
1. (+) identity: x+0 = x
2. (/) identity: x/1 = x
3. (*) distributes: (n*x+n*y) = n*(x+y)
4. (*) commutes: x*y = y*x
Apply rules 1, 3, 4, 2: N = (2+3)*4
two operations instead of five, no division operator
Same idea works with very large tables, but the payoff is much higher
3/12/09 Bill Howe, eScience Institute 31

H0 : (x,y,b) V0 : (z)
A
restrict(0, z >b)
B
color is depth
Algebraic Manipulation of Scientific Datasets,
B. Howe, D. Maier, VLDBJ 2005

H0 : (x,y,b) V0 : ( )
apply(0, z=(surf  b) *  )
bind(0, surf)
C
color is salinity
GridFields: An Algebra of Meshes
Example (1)
H = Scan(context, "H")
rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H)
H = rH =
dimension
predicate
color: bathymetry
3/18/2022 howeb@stccmop.org
Example: Transect
P
3/18/2022 howeb@stccmop.org
Transect: Bad Query Plan

H(x,y,b)
V(z)
r(z>b) b(s) regrid

P
P  V
1) Construct full-size 3D grid
2) Construct 2D transect grid
3) Join 1) with 2)
3/18/2022 howeb@stccmop.org
Transect: Optimized Plan
P  V
V(z)
P
H(x,y,b)
regrid b(s)
 regrid

1) Find 2D cells containing points
2) Create “stacks” of 2D cells carrying data
3) Create 2D transect grid
4) Join 2) with 3)
3/18/2022 howeb@stccmop.org
1) Find cells containing points in P
3/18/2022 howeb@stccmop.org
1)
4)
2)
1) Find cells containing points in P
2) Construct “stacks” of cells
4) Join 2) with 3)
Transect: Results
3/18/2022 howeb@stccmop.org
0
5
10
15
20
25
30
35
40
45
vtk(3D) interpolate simple interp_o simple_o
secs
800 MB
(1 timestep)
• Screenshot of OPeNDAP demo
3/18/2022 Bill Howe, UW 39
http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc?
ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
3/18/2022 Bill Howe, UW 40
3/18/2022 Bill Howe, UW 41
Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y))
Commutative Law of Regrid and Restrict:
G0 = Regrid(Restrict0(X), Restrict0(Y)))
G1 = Regrid(Restrict1(X), Restrict1(Y)))
:
GN = Regrid(Restrict2(X), Restrict2(Y)))
R = Stitch(G0, G1, G2)
3/18/2022 Bill Howe, UW 42
3/18/2022 Bill Howe, UW 43
3/18/2022 Bill Howe, UW 44
3/18/2022 Bill Howe, UW 45
3/18/2022 Bill Howe, UW 46
Globally conservative
Parallelizable
Commutes with user-
selected restrictions
masking to handle
mismatched domains
Todos:
• Characterize the error relative to plain supermeshing
• Universal Regridding-as-a-Service
Outreach and Usage
• Code released
– Search “gridfields” on google code
– http://code.google.com/p/gridfields/
– C++ with Python bindings
• Integrated into the Hyrax Data Server
– OPULS project funded by NOAA
– Server-side processing of unstructured grids
• Other users
– US Geological Survey
– NOAA
3/18/2022 Bill Howe, UW 47
3/18/2022 Bill Howe, UW 47

Mais conteúdo relacionado

Destaque

Apps para audio y fotos
Apps para audio y fotosApps para audio y fotos
Apps para audio y fotosireneyaitana
 
Simulacro, subjetividad y bipolitica the show
Simulacro, subjetividad y bipolitica  the showSimulacro, subjetividad y bipolitica  the show
Simulacro, subjetividad y bipolitica the showcatiiz
 
El proyecto de investigación(slide)
El proyecto de investigación(slide)El proyecto de investigación(slide)
El proyecto de investigación(slide)moises2ve
 
Comelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetComelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetJMAC Supply
 
TOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP CertificationTOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP Certificationseric2167
 
Case Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareCase Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareJunaid Lodhi
 
Y15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterY15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterJerome Domurat
 
Surfconext a new collaboration paradigm
Surfconext a new collaboration paradigmSurfconext a new collaboration paradigm
Surfconext a new collaboration paradigmPaul van Dijk
 
Exeed Training Calendar 2017
Exeed Training Calendar 2017Exeed Training Calendar 2017
Exeed Training Calendar 2017Samras Mayimi
 
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...Rail~Volution
 
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Karin Thier
 

Destaque (15)

Apps para audio y fotos
Apps para audio y fotosApps para audio y fotos
Apps para audio y fotos
 
Simulacro, subjetividad y bipolitica the show
Simulacro, subjetividad y bipolitica  the showSimulacro, subjetividad y bipolitica  the show
Simulacro, subjetividad y bipolitica the show
 
El proyecto de investigación(slide)
El proyecto de investigación(slide)El proyecto de investigación(slide)
El proyecto de investigación(slide)
 
Torne se um Associado SP Busca
Torne se um Associado SP BuscaTorne se um Associado SP Busca
Torne se um Associado SP Busca
 
Comelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetComelit 3319/2 Data Sheet
Comelit 3319/2 Data Sheet
 
La atmósfera
La atmósferaLa atmósfera
La atmósfera
 
TOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP CertificationTOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP Certification
 
final ppt 2007
final ppt 2007final ppt 2007
final ppt 2007
 
Case Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareCase Study 2: Instrumentation Software
Case Study 2: Instrumentation Software
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Y15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterY15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate Rutter
 
Surfconext a new collaboration paradigm
Surfconext a new collaboration paradigmSurfconext a new collaboration paradigm
Surfconext a new collaboration paradigm
 
Exeed Training Calendar 2017
Exeed Training Calendar 2017Exeed Training Calendar 2017
Exeed Training Calendar 2017
 
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
 
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
 

Mais de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 

Mais de University of Washington (20)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Conservative Regridding and Algebraic Manipulation of Meshes with GridFields

  • 1. Bill Howe UW eScience Institute Computer Science & Engineering University of Washington Big Ocean Data: Numerically Conservative Parallel Query Processing 3/18/2022 Bill Howe, UW 1 Scott Moe Applied Math University of Washington
  • 4. eScience Scalable Data Analytics Group Bill Howe, Phd (databases, cloud, data-intensive scalable computing, visualization) Postdocs (past and present) – Dan Halperin, postdoc (distributed systems, algorithms) – Seung-Hee Bae, postdoc (scalable machine learning algorithms) – (alumni) Marianne Shaw, Phd (hadoop, graph query, health informatics) Staff (past and present) – Sagar Chitnis (cloud computing, databases, web services) – (alumni) Alicia Key (web applications, visualization) (alumni) – (alumni) Garret Cole (databases, cloud computing, web services) (alumni) Students – Scott Moe (2nd yr Phd, Applied Math) Partners – CSE DB Faculty: Magda Balazinska, Dan Suciu – CSE students: Paris Koutris, Prasang Upadhyaya, Shengliang Xu, Jinjing Wang – UW-IT (web applications, QA/support) – Cecilia Aragon, Phd, Associate Professor, HCDE (visualization, scientific applications) 3/18/2022 Bill Howe, UW 4
  • 5. Summary of Big Data in the Ocean Sciences • In transition from expedition-based to observatory-based science • Enormous investments in infrastructure for linking sensors • Ad hoc, on-demand integration of large, heterogeneous, distributed datasets is the universal requirement in this regime • “Integration” means “regridding” – mesh to pixels, mesh to mesh, trajectory to mesh – satellites to models, models to models, observations to models • Regridding is hard – Must be easy, tolerant of unusual grids, numerically conservative, efficient Summary of our work • We have the beginnings of a “universal regridding” operator with nice algebraic properties • We’re using it to implement efficient distributed data sharing applications, parallel algorithms, and more 3/18/2022 Bill Howe, UW 5
  • 7. Regional Scale Nodes 3/18/2022 Bill Howe, UW 7 John Delaney 10s of Gigabits/second from the ocean floor
  • 8. Cyberinfrastructure 3/18/2022 Bill Howe, UW 8 • “Integration of numerical ocean models and their output as derived data products” • “Generation and distribution of qualified science data products in (near) real time” • “Access to, syntactic transformation, semantic mediation, analysis, and visualization of science data, data products, and derived data products” Matthew Arrot
  • 9. 3/18/2022 Bill Howe, UW 9 17 federal organizations named as partners 11 Regional Associations “a strategy for incorporating observation systems from … near shore waters as part of … a network of observatories.”
  • 10. Center for Coastal Margin Observation and Prediction (CMOP) 3/18/2022 Bill Howe, UW 10 Antonio Baptista
  • 11. Virtual Mekong Basin 3/18/2022 Bill Howe, UW 11 img src: Mark Stoermer, UW Center for Environmental Visualization Jeff Richey
  • 12. 3/18/2022 Bill Howe, UW 12 # of bytes # of data sources telescopes spectra LSST (~100PB; images, spectra) PanSTARRS (~40PB; images, trajectories) OOI (~50TB/year; sims, RSN) IOOS (~50TB/year; sims, satellite, gliders, AUVs, vessels, more) CMOP (~10TB/year; sims, stations, gliders, AUVs, vessels, more) SDSS (~400TB; images, spectra, catalogs) n-body sims models AUVs stations cruises, CTDs flow cytometry gliders ADCP satellites Astronomy Ocean Sciences 3 V’s of Big Data Volume Variety Velocity
  • 13. SQLShare: Ad Hoc Databases for Science Problem – Data is captured and manipulated in ad hoc files – Made sense five years ago; the data volumes were manageable – But now: 50k rows, 100s of files, sharing Why not put everything into a database? – A huge amount of up-front effort – Hard to design for a moving target – Running a database system is a huge drain Solution: SQLShare – Upload data through your browser: no setup, no installation – Login to browse science questions in English – Click a question, see the SQL to answer it question – Edit the SQL to answer an "adjacent" question, even if you wouldn’t know how to write it from scratch 3/18/2022 13 https://sqlshare.escience.washington.edu/
  • 14. Summary of Big Data in the Ocean Sciences • Transition from expedition-based to observatory-based science • Enormous investments in integrating infrastructure for data acquisition and interoperability • Ad hoc, on-demand integration of large, heterogeneous, distributed datasets is the universal requirement in this regime • “Integration” means “regridding” – mesh to pixels, mesh to mesh, trajectory to mesh – satellites to models, models to models, observations to models • Regridding is hard – Must be easy, tolerant of unusual grids, numerically conservative, efficient Summary of our work • We have the beginnings of a “universal regridding” operator with nice algebraic properties • We’re using it to implement efficient distributed data sharing applications, parallel algorithms, and more 3/18/2022 Bill Howe, UW 14
  • 15. Status Quo • “FTP + MATLAB” • “Nascent Databases” – File-based, format-specific API – UniData’s NetCDF, HDF5 – Some IO optimization, some indexing • Data Servers – Same as file-based systems, – but supports RPC 3/18/2022 Bill Howe, UW 15 Hyrax None of this scales - up with data volumes - up with number of sources - down with developer expertise
  • 16. What do we really need here? • Claim: Everything reduces to regridding • Model-data comparisons skill assessment? Regrid observations onto model mesh • Model-model comparison? Regrid one model’s mesh onto the other’s • Model coupling? Regrid a meso-scale atmospheric model onto your regional ocean model • Visualization? Regrid onto a 3D mesh, or regrid onto a 2D array of pixels 3/18/2022 Bill Howe, UW 16
  • 17. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 17
  • 18. 3/18/2022 Bill Howe, UW 18 Washington Oregon Columbia River Estuary
  • 21. Structured grids are easy 3/18/2022 Bill Howe, eScience Institute 21  The data model (Cartesian products of coordinate variables)  immediately implies a representation, (multidimensional arrays)  an API, (reading and writing subslabs)  and an efficient implementation (address calculation using array “shape”)
  • 22. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 22
  • 23. Naïve Method: Interpolation (Spatial Join) 3/18/2022 Bill Howe, UW 23 For each vertex in the target grid, Find containing cell in the source grid, Evaluate the basis functions to interpolate
  • 25. Supermeshing [Farrell 10] 3/18/2022 Bill Howe, UW 25 For each cell in the target grid, Find overlapping cells in the source grid, Compute their intersections Derive new coefficients to minimize L2 norm * Guaranteeed Conservative * Minimizes Error But: Domains must match exactly
  • 27. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 27
  • 28. Pre-Relational: if your data changed, your application broke. Early RDBMS were buggy and slow (and often reviled), but required only 5% of the application code. “Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.” Key Idea: Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation Digression: Relational Database History -- Codd 1979
  • 29. Key Idea: An Algebra of Tables select project join join Other operators: aggregate, union, difference, cross product
  • 30. 30 Review: Algebraic Optimization N = ((4*2)+((4*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*4 two operations instead of five, no division operator Same idea works with very large tables, but the payoff is much higher
  • 31. 3/12/09 Bill Howe, eScience Institute 31  H0 : (x,y,b) V0 : (z) A restrict(0, z >b) B color is depth Algebraic Manipulation of Scientific Datasets, B. Howe, D. Maier, VLDBJ 2005  H0 : (x,y,b) V0 : ( ) apply(0, z=(surf  b) *  ) bind(0, surf) C color is salinity GridFields: An Algebra of Meshes
  • 32. Example (1) H = Scan(context, "H") rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H) H = rH = dimension predicate color: bathymetry
  • 34. 3/18/2022 howeb@stccmop.org Transect: Bad Query Plan  H(x,y,b) V(z) r(z>b) b(s) regrid  P P  V 1) Construct full-size 3D grid 2) Construct 2D transect grid 3) Join 1) with 2)
  • 35. 3/18/2022 howeb@stccmop.org Transect: Optimized Plan P  V V(z) P H(x,y,b) regrid b(s)  regrid  1) Find 2D cells containing points 2) Create “stacks” of 2D cells carrying data 3) Create 2D transect grid 4) Join 2) with 3)
  • 36. 3/18/2022 howeb@stccmop.org 1) Find cells containing points in P
  • 37. 3/18/2022 howeb@stccmop.org 1) 4) 2) 1) Find cells containing points in P 2) Construct “stacks” of cells 4) Join 2) with 3)
  • 38. Transect: Results 3/18/2022 howeb@stccmop.org 0 5 10 15 20 25 30 35 40 45 vtk(3D) interpolate simple interp_o simple_o secs 800 MB (1 timestep)
  • 39. • Screenshot of OPeNDAP demo 3/18/2022 Bill Howe, UW 39 http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc? ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
  • 41. 3/18/2022 Bill Howe, UW 41 Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y)) Commutative Law of Regrid and Restrict: G0 = Regrid(Restrict0(X), Restrict0(Y))) G1 = Regrid(Restrict1(X), Restrict1(Y))) : GN = Regrid(Restrict2(X), Restrict2(Y))) R = Stitch(G0, G1, G2)
  • 46. 3/18/2022 Bill Howe, UW 46 Globally conservative Parallelizable Commutes with user- selected restrictions masking to handle mismatched domains Todos: • Characterize the error relative to plain supermeshing • Universal Regridding-as-a-Service
  • 47. Outreach and Usage • Code released – Search “gridfields” on google code – http://code.google.com/p/gridfields/ – C++ with Python bindings • Integrated into the Hyrax Data Server – OPULS project funded by NOAA – Server-side processing of unstructured grids • Other users – US Geological Survey – NOAA 3/18/2022 Bill Howe, UW 47 3/18/2022 Bill Howe, UW 47

Notas do Editor

  1. OOI: ~$100M, ~$40, ~$60, ~$100, ~$60 in 2009 – 2013
  2. IOOS: $37M in 2012, up from 2010, down from 2004-2007
  3. $17M
  4. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results:
  5. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results:
  6. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results: