SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Dr. Marcus D. Hanwell
mhanwell@kitware.com
@mhanwell
www.kitware.com
27 March, 2014
South Bay Meetup
Big Data Visualization Frameworks and
Applications at Kitware
!"
#"
About Kitware
Kitware, Inc.
•  Founded in 1998 by five former GE Research employees
•  118 current employees; 39 with PhDs
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter
•  Offices
–  Clifton Park, NY
–  Carrboro, NC
–  Santa Fe, NM
–  Lyon, France
•  2011 Small Business
Administration’s
Tibbetts Award
•  HPCWire Readers
and Editor’s Choice
•  Inc’s 5000 List since
2008
Kitware’s customers & collaborators
Over 75 academic
institutions including!
•  Harvard
•  MIT
•  University of California,
Berkeley
•  Stanford University
•  California Institute of
Technology
•  Imperial College London
•  Johns Hopkins University
•  Cornell University
•  Columbia University
•  Robarts Research Institute
•  University of Pennsylvania
•  Rensselaer Polytechnic
Institute
•  University of Utah
•  University of North Carolina
Over 50 government
agencies and labs
including!
•  National Institutes of Health
(NIH)
•  National Science Foundation
(NSF)
•  National Library of Medicine
(NLM)
•  Department of Defense (DOD)
•  Department of Energy (DOE)
•  Defense Advanced Research
Projects Agency (DARPA)
•  Army Research Lab (ARL)
•  Air Force Research Lab
(AFRL)
•  Sandia (SNL)
•  Los Alamos National Labs
(LANL)
•  Argonne (ANL)
•  Oak Ridge (ORNL)
•  Lawrence Livermore (LLNL)
Over 100
commercial
companies in fields
including!
•  Automotive
•  Aircraft
•  Defense
•  Energy technology
•  Environmental sciences
•  Finance
•  Industrial inspection
•  Oil & gas
•  Pharmaceuticals
•  Publishing
•  3D Mapping
•  Medical devices
•  Security
•  Simulation
Kitware: Core Technologies
$"
CMake
CDash
Business Model: Open Source
•  Open-source Software
– Normally BSD-licensed
– Collaboration platforms
•  Collaborative Research and Development
•  Technology Integration
•  Services and Support
•  Consulting
•  Training and webinars
%"
&"
Data at Scale
What is “Big Data”?
•  We deal with two primary types
– Small number of very large data elements
•  Computational fluid dynamics simulations
•  Cosmological simulations covering billions of years
– Large number of (usually smaller) elements
•  Social media data, financial data, geospatial data
•  Over 3M compounds, 40M quantum calculations
•  Different types of data differ in structure
•  Very different strategies are needed!
'"
Many Small Versus Few Big
•  Many small “records”
– Major challenge lies in indexing, searching
– Once found we can generally send to browser
– Aggregation and/or summarization important
•  Few big “records”
– Major challenge lies in data reduction
– Must work hard to do all work near the data
– Can still deliver reduced data to web clients
("
Considerations for Data at Scale
•  Key areas to be addressed:
– Storage
– Metadata extraction
– Index
– Search
– Visualization
– Interaction
– Further calculations, simulations, etc.
!)"
Data Storage at Scale
•  How much data do you have?
•  Must all data be stored in the same place?
•  Existing metadata extraction techniques?
•  Uniform data layout/schema?
•  Existing index/search techniques?
– Algorithmic challenges
– Open implementations that scale
– Interaction with the database
!!"
What Does a Result Look Like?
•  Once you are done searching:
– What does a typical result look like?
– How big is the resulting data?
– How should the data be presented?
– Is all data in the database referenced?
•  Is a simple ordered list useful?
•  What about multidimensional result sets?
!#"
Challenges with Big Data
•  Storage for petabytes of data is tough
– Moving it is even harder
– Extracting metadata is a challenge
– Backing up and restoring isn’t any easier
– Even individual results can be very large
•  Mostly done in central facilities
– Specialized file systems
– Power, backup, redundancy, staff
!*"
!+"
Frameworks
The Visualization Toolkit (VTK)
•  Collection of C++ libraries
– Leveraged by many applications
– Divided into logical areas, e.g.
•  Filtering – data processing in visualization pipeline
•  InfoVis – informatics visualization
•  Widgets – 3D interaction widgets
•  VolumeRendering – 3D volume rendering
•  Cross platform, using OpenGL
•  Wrapped in Python, Tcl and Java
http://www.vtk.org/
Visualization
VTK Architecture
•  Hybrid approach
– Compiled C++ core (faster algorithms)
– Interpreted applications (rapid development)
– Interpreted layer generated automatically
C++
core
Interpreter
The Visualization Pipeline
•  A sequence of algorithms that operate on
data objects to generate geometry
Source
Data
Data
Filter
Filter
Data
Data
Mapper
Mapper Actor
Actor
Render on
screen
ParaView
•  Parallel visualization application
•  Open source, BSD licensed
•  Turn-key application wrapper around VTK
•  Parallel data processing and rendering
http://www.paraview.org/
ParaView is for Extremely Large Data
1 billion cell asteroid
detonation simulation
! billion cell
weather simulation
source: Sandia National Lab
,-./-0"
1234250"
6784-"92:"
,-./-0"
1234250"
6784-"92:"
;<="
>?@""A9" >?@"A9"
@"B2CD23-34"E.4."
<.0.FF-F8GC"H20">"A9I4-"
J"
,-3/-0"K-0L-0"
,-3/-0"K-0L-0"
,-3/-0"K-0L-0"
,-3/-0"K-0L-0"
1F8-34"E.4."K-0L-0"
E.4."K-0L-0"
E.4."K-0L-0"
E.4."K-0L-0"
E.4."K-0L-0"
E.4."K-0L-0"
Depth Composite
Tile Display
Control,
Display and Rendering
of Small Data
•  Python web framework built on CherryPy
•  Flexible HTML5 web server architecture
•  Developed with a clean separation
– Application in HTML, JavaScript, CSS
– Service in pure Python (+ wrapped C/C++)
•  Packages several other frameworks too
– Bootstrap, D3, Vega, MongoDB
•  Making web apps easier to develop/deploy
##"http://tangelo.kitware.com/
•  Python for server side, native web clients
•  Easily add new services (single .py file)
– Use RESTful API
– JSON delivery of data
– Full power of Python
•  Rapid prototyping
#*"
Browser
Tangelo
web
service
“foo”
index.html
index.js
styles.css
foo.py
ParaViewWeb – Web Enabled
• Bring 3D visualization to a web page
– Targeting HPC web portal
– Simple usage with basic/rigid workflow
– Framework to develop 3D web applications
– Must work now (no WebGL)
– Support collaboration with multiple clients sharing
the same visualization
• The goal was NOT to
– Redo another generic ParaView client
#+
Tangelo Powering ParaViewWeb
• We need a web front end to
– Start processes
– Forward communications
#$
#%"
Simple Tangelo
Examples
Visualizing Flickr Metadata
•  Uses Google maps
•  Flickr data in MongoDB
•  Python service retrieves
data using PyMongo
•  D3 layer over maps
–  Geolocation
–  Day of the week
–  Photo (mouse hover)
#&"
Enron Email Network Visualization
•  enron.py retrieves emails
–  Computes graph structure
•  D3 force layout for viz
•  Controls to:
–  Slice email by time
–  Change email originator
–  Set number of hops
•  Tool targeted at
investigating social
network behavior
#'"
Bitcoin Analysis
•  Uses bitcoin blockchain
–  Individual transactions
•  Intensity histogram with
transaction volume in
date/amount ranges
•  Detail plot with individual
transactions
•  Anomaly search
–  Theft detection
•  Study large scale
behavior over time
#("
*)"
Larger Projects
Informatics Software Stack
*!"
MNO"
PD-3M8-Q"
<.0.M8-Q6-R"
M8G2C8BG"
SN;T?U.L.GB08D4"
E*?M-V."
6-R"WDDG"
E-GX42D"WDDG"
S2GD84.F"
12G4G"
YF8BX0" 17.084I@-4"
<I4723"
N.3V-F2"
," ;.4F.R" @TNO" S./22D" ;23V2" =CD.F." KZT"
1KM?
UKP@"
W3.FIG8G"W/.D4-0G" E.4."W/.D4-0G"
J" J"
Digital Pathology
•  MongoDB used for image tiles
– Store once, using multiple times
– Metadata, processing status, results
– Browser-based application/interaction
*#"
https://slide-atlas.org/
Arbor is an NSF-funded project to enable evolutionary
biological research by making it easy for biologists to
•  create,
•  test,
•  and visualize
algorithms on the Tree of Life.
Below is the evolutionary tree for Heliconia
(Lobster Claw) plants coupled to a character
matrix of observational data such as color, feature
measurements, and range.
Cosmology Data Management
*+"
Supercomputer DISC
LS
ST
K8C5F.[23"
12GC2N22FG"
Y0.C-Q20X"
!"#"$%&#'
K8C5F.[23"
=3D54"/-BX"
12GC2N22FG"
123V50.[23"
(")"*+,-'
.,)/,)'
(")"*+,-'
!$+,0#'
<.0.M8-Q6-R"
1,2'3)4-&,)'
K50L-IG"
Advanced User/Developer/
Scientist
E.4."=34-3G8L-"
KB.F.RF-"12CD]"
Database
Scientist
Experimentalist
Database
*$"
$+2!4&54644$&7"'
Voronoi Tesselation
FOF HaloFinder
Stream Counter
CosmoTools ParaView Plugins
Caustics
•  ANL: Salman Habib, Katrin Heitmann, Tom
Peterka, Adrian Pope, Hal Finkel
•  LANL: Jim Ahrens, Jon Woodring, Pat Fasel
•  Kitware: George Zagaris, Berk Geveci, Casey
Goodlett, Zach Mullen
UV-CDAT for Climate Visualization
•  Ultrascale Visualization and Climate Data
Analysis Toolkit
– Collaborative effort led by LLNL
– Integrate DOE’s climate modeling/measures
•  Integrates a large number of tools/libs
– CDAT, VTK, R, ParaView, DV3D
•  Current data sets at about 3.5 petabytes
– Growing to 350 petabytes to ~3 exabytes
*%"
Climate Data Visualization
*&"
*'"
Open Chemistry
Applications Being Developed
•  Three independent applications
•  Communication handled with local sockets
•  Avogadro 2: Structure editing, input generation,
output viewing, and analysis
•  MoleQueue: Running local and remote jobs in
standalone programs, and management
•  MongoChem: Storage of data, searching, entry,
and annotation
•  Supporting frameworks (AvogadroLibs & VTK)
*("http://www.openchemistry.org/
Use Cases for Open Chemistry
•  Researchers interested in molecules
–  Various sources of starting structure
•  Perform studies using various codes
–  Some performed locally
–  Others using high-performance computing
–  Different calculations produce different data
•  How do these results get stored, analyzed?
–  How can previous work be indexed, reused?
+)"
MongoChem Overview
•  A desktop cheminformatics tool
– Chemical data exploration and analysis
– Interactive, editable, and searchable database
•  Leverages several open-source projects
– Qt, VTK, MongoDB, Avogadro 2, Open Babel
•  Designed to look at many molecules
•  Spots patterns, outliers; runs many jobs
•  Scales to studies with ~3 million structures
Architecture Overview
•  Native, cross-platform C++ application built with Qt and Avogadro 2
•  Stores chemical data in a NoSQL MongoDB database
•  Uses VTK for 2D and 3D dataset visualization
+#"
Moving MongoChem to the Web
•  Increasingly important to share data
•  MongoDB not suitable for web directly
– Developing RESTful APIs
– Building on VTKWeb and Tangelo
– Can do more processing close to the data
•  Can we develop a platform for chemists?
– Could this address materials and other areas?
– Deposition of data, curation, client-server
processing, web interface and APIs
+*"
VTKWeb, Tangelo and MongoChem
•  Uses VTK’s web architecture
•  Performs interactive 3D rendering
•  Runs in any modern web browser
•  Same MongoDB server as MongoChem
•  Moves more to the client JavaScript code
•  Using a simple, Python-based server
– Easy to add new APIs
– Easy to deploy/integrate into other solutions
++"
MongoChemWeb Demo
+$"http://data.openchemistry.org/
Why MongoDB?
•  SQL vs NoSQL approaches
•  MongoDB is implemented in C++
– Scales well by adding extra shards (nodes)
– Core constructs written in C++
– Access to JavaScript in map-reduce
– Memory-mapped database files
– GridFS for storing large files
– Clients in many languages – C, C++, Python
– Large, established open-source project
+%"
JSON, BSON and NoSQL
•  JSON: JavaScript Object Notation
•  BSON: Binary JSON
– Binary-encoded serialization of JSON-like
documents
•  MongoDB stores BSON documents
– Collections are memory-mapped BSON
– Clients work directly with BSON on-the-wire
•  BSON written by client can be used by server
•  Very little overhead reading/writing documents
+&"
Nature of Data
•  Many documents for molecules
–  Individual results are usually MBs
–  Small molecules, electronic structure, MD, etc.
•  Materials tend to be different
–  Less documents, larger results
–  Less existing identifiers/search techniques
•  Institutions maintain big disks
–  Move to referencing data, client-server, etc.
+'"
+("
Clean Energy Project
Clean Energy Project: Introduction
•  Searching for organic photovoltaics
– IBM World Community Grid
– High-throughput, in-silico study
– Partnered with experimental groups
•  Synthesize most promising candidates
•  Many views of the data
– Simple numbers for many properties
– 2D graphs and 3D chemical structures
– 3D structures with quantum calculation output
$)"
http://cleanenergy.molecularspace.org/
Clean Energy Project: Big Data
•  Overall size and scope of the data:
– 2.3 million unique molecules
•  22 million conformers
•  150 million DFT calculations
•  400TB+ of raw output data
•  80GB of metadata
– Growing at just under 1TB a day
– ~2.8 million unique molecules
•  ~27M conformers and 185M DFT calculations
•  0.5PB of raw data in the latest result set
$!"
Clean Energy Project: Open Data
•  Part of the Materials Genome Initiative
•  Data released under CC-BY-SA license
•  Amazing opportunity for Open Chemistry
– Very large dataset pushing current limits
– Openly-licensed, allowing us to experiment
– Opportunity to improve the state-of-the-art
– Molecules fit our model
•  Less than 1024 atoms
•  DFT calculations with metadata extraction
$#"
Building Community
•  Community around projects
•  Using Kitware software process
–  Ensuring quality with continuous
testing
–  Code contributions on the web
–  Public mailing lists, bug trackers,
and code review
•  Promoting projects and
participation
–  Publication
–  Conferences
–  Workshops
–  Social media
$*"
Software
Repository
Build, Test
& Package
Community
Review
Developers
& Users
Conclusions
•  Shared frameworks needed to work with data
•  Domain specific approaches are essential
–  One size fits all rarely works well
–  The right frameworks can be extended/customized
•  Storing, sharing, publishing, and analyzing data
•  Data scales increasing, client-server can help
•  Semantic data is an important aspect too
•  Questions?
$+"

Mais conteúdo relacionado

Mais procurados

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...Dataconomy Media
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...Dataconomy Media
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept Miha Ahronovitz
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Efficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemEfficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemSaurav Kumar
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big DataSrinath Perera
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media
 

Mais procurados (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler..."Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler...
 
"Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ..."Machine Learning and Internet of Things, the future of medical prevention", ...
"Machine Learning and Internet of Things, the future of medical prevention", ...
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Efficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed systemEfficient frequent pattern mining in distributed system
Efficient frequent pattern mining in distributed system
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler..."Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...
 

Destaque

Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startupswallesplace
 
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...Dataconomy Media
 
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.BURESI
 
Big Data from idea to service provider from a Consulting perspective - a quic...
Big Data from idea to service provider from a Consulting perspective - a quic...Big Data from idea to service provider from a Consulting perspective - a quic...
Big Data from idea to service provider from a Consulting perspective - a quic...Edzo Botjes
 
Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
 
Big data visualization framework
Big data visualization frameworkBig data visualization framework
Big data visualization frameworkAbhinav Krishna
 
Big data Visualization and Dashboards
Big data Visualization and DashboardsBig data Visualization and Dashboards
Big data Visualization and DashboardsMia Yuan Cao
 
Seabourne Web Apps 2014 2015
Seabourne Web Apps 2014 2015Seabourne Web Apps 2014 2015
Seabourne Web Apps 2014 2015Dan Nicollet
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupDavid Johnston
 
Marlabs Capabilities Overview: Banking and Finance
Marlabs Capabilities Overview: Banking and Finance Marlabs Capabilities Overview: Banking and Finance
Marlabs Capabilities Overview: Banking and Finance Marlabs
 
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big DataRaffael Marty
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc.
 
Delightful Dashboards that deliver results
Delightful Dashboards that deliver resultsDelightful Dashboards that deliver results
Delightful Dashboards that deliver resultsUdhaya Kumar Padmanabhan
 
How to Start a Consulting Business
How to Start a Consulting BusinessHow to Start a Consulting Business
How to Start a Consulting BusinessMBO Partners
 
DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting DesignMind
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODEBI Brainz
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016alanfgates
 

Destaque (20)

Big Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics StartupsBig Data Startups - Top Visualization and Data Analytics Startups
Big Data Startups - Top Visualization and Data Analytics Startups
 
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...
Dr. Mihael Ankerst, Manager Customer Data Analytics at Allianz Deutschland - ...
 
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.
Tuesday's Leaders. Enabling Big Data, a Boston Consulting Group Report.
 
Big Data from idea to service provider from a Consulting perspective - a quic...
Big Data from idea to service provider from a Consulting perspective - a quic...Big Data from idea to service provider from a Consulting perspective - a quic...
Big Data from idea to service provider from a Consulting perspective - a quic...
 
Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Big data visualization framework
Big data visualization frameworkBig data visualization framework
Big data visualization framework
 
Big data Visualization and Dashboards
Big data Visualization and DashboardsBig data Visualization and Dashboards
Big data Visualization and Dashboards
 
Seabourne Web Apps 2014 2015
Seabourne Web Apps 2014 2015Seabourne Web Apps 2014 2015
Seabourne Web Apps 2014 2015
 
3design For A Change
3design For A Change3design For A Change
3design For A Change
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Marlabs Capabilities Overview: Banking and Finance
Marlabs Capabilities Overview: Banking and Finance Marlabs Capabilities Overview: Banking and Finance
Marlabs Capabilities Overview: Banking and Finance
 
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
zData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and SummaryzData Inc. Big Data Consulting and Services - Overview and Summary
zData Inc. Big Data Consulting and Services - Overview and Summary
 
Delightful Dashboards that deliver results
Delightful Dashboards that deliver resultsDelightful Dashboards that deliver results
Delightful Dashboards that deliver results
 
How to Start a Consulting Business
How to Start a Consulting BusinessHow to Start a Consulting Business
How to Start a Consulting Business
 
DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting DesignMind Data Analytics Consulting
DesignMind Data Analytics Consulting
 
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
5 Big Data Visualization Maps that Will Make Your HEAD EXPLODE
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
 

Semelhante a Big data visualization frameworks and applications at Kitware

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesOCLC
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingAbzetdin Adamov
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMETyrone Grandison
 

Semelhante a Big data visualization frameworks and applications at Kitware (20)

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Big Data
Big Data Big Data
Big Data
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision Making
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Big data visualization frameworks and applications at Kitware

  • 1. Dr. Marcus D. Hanwell mhanwell@kitware.com @mhanwell www.kitware.com 27 March, 2014 South Bay Meetup Big Data Visualization Frameworks and Applications at Kitware !"
  • 3. Kitware, Inc. •  Founded in 1998 by five former GE Research employees •  118 current employees; 39 with PhDs •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter •  Offices –  Clifton Park, NY –  Carrboro, NC –  Santa Fe, NM –  Lyon, France •  2011 Small Business Administration’s Tibbetts Award •  HPCWire Readers and Editor’s Choice •  Inc’s 5000 List since 2008
  • 4. Kitware’s customers & collaborators Over 75 academic institutions including! •  Harvard •  MIT •  University of California, Berkeley •  Stanford University •  California Institute of Technology •  Imperial College London •  Johns Hopkins University •  Cornell University •  Columbia University •  Robarts Research Institute •  University of Pennsylvania •  Rensselaer Polytechnic Institute •  University of Utah •  University of North Carolina Over 50 government agencies and labs including! •  National Institutes of Health (NIH) •  National Science Foundation (NSF) •  National Library of Medicine (NLM) •  Department of Defense (DOD) •  Department of Energy (DOE) •  Defense Advanced Research Projects Agency (DARPA) •  Army Research Lab (ARL) •  Air Force Research Lab (AFRL) •  Sandia (SNL) •  Los Alamos National Labs (LANL) •  Argonne (ANL) •  Oak Ridge (ORNL) •  Lawrence Livermore (LLNL) Over 100 commercial companies in fields including! •  Automotive •  Aircraft •  Defense •  Energy technology •  Environmental sciences •  Finance •  Industrial inspection •  Oil & gas •  Pharmaceuticals •  Publishing •  3D Mapping •  Medical devices •  Security •  Simulation
  • 6. Business Model: Open Source •  Open-source Software – Normally BSD-licensed – Collaboration platforms •  Collaborative Research and Development •  Technology Integration •  Services and Support •  Consulting •  Training and webinars %"
  • 8. What is “Big Data”? •  We deal with two primary types – Small number of very large data elements •  Computational fluid dynamics simulations •  Cosmological simulations covering billions of years – Large number of (usually smaller) elements •  Social media data, financial data, geospatial data •  Over 3M compounds, 40M quantum calculations •  Different types of data differ in structure •  Very different strategies are needed! '"
  • 9. Many Small Versus Few Big •  Many small “records” – Major challenge lies in indexing, searching – Once found we can generally send to browser – Aggregation and/or summarization important •  Few big “records” – Major challenge lies in data reduction – Must work hard to do all work near the data – Can still deliver reduced data to web clients ("
  • 10. Considerations for Data at Scale •  Key areas to be addressed: – Storage – Metadata extraction – Index – Search – Visualization – Interaction – Further calculations, simulations, etc. !)"
  • 11. Data Storage at Scale •  How much data do you have? •  Must all data be stored in the same place? •  Existing metadata extraction techniques? •  Uniform data layout/schema? •  Existing index/search techniques? – Algorithmic challenges – Open implementations that scale – Interaction with the database !!"
  • 12. What Does a Result Look Like? •  Once you are done searching: – What does a typical result look like? – How big is the resulting data? – How should the data be presented? – Is all data in the database referenced? •  Is a simple ordered list useful? •  What about multidimensional result sets? !#"
  • 13. Challenges with Big Data •  Storage for petabytes of data is tough – Moving it is even harder – Extracting metadata is a challenge – Backing up and restoring isn’t any easier – Even individual results can be very large •  Mostly done in central facilities – Specialized file systems – Power, backup, redundancy, staff !*"
  • 15. The Visualization Toolkit (VTK) •  Collection of C++ libraries – Leveraged by many applications – Divided into logical areas, e.g. •  Filtering – data processing in visualization pipeline •  InfoVis – informatics visualization •  Widgets – 3D interaction widgets •  VolumeRendering – 3D volume rendering •  Cross platform, using OpenGL •  Wrapped in Python, Tcl and Java http://www.vtk.org/
  • 17. VTK Architecture •  Hybrid approach – Compiled C++ core (faster algorithms) – Interpreted applications (rapid development) – Interpreted layer generated automatically C++ core Interpreter
  • 18. The Visualization Pipeline •  A sequence of algorithms that operate on data objects to generate geometry Source Data Data Filter Filter Data Data Mapper Mapper Actor Actor Render on screen
  • 19. ParaView •  Parallel visualization application •  Open source, BSD licensed •  Turn-key application wrapper around VTK •  Parallel data processing and rendering http://www.paraview.org/
  • 20. ParaView is for Extremely Large Data 1 billion cell asteroid detonation simulation ! billion cell weather simulation source: Sandia National Lab
  • 22. •  Python web framework built on CherryPy •  Flexible HTML5 web server architecture •  Developed with a clean separation – Application in HTML, JavaScript, CSS – Service in pure Python (+ wrapped C/C++) •  Packages several other frameworks too – Bootstrap, D3, Vega, MongoDB •  Making web apps easier to develop/deploy ##"http://tangelo.kitware.com/
  • 23. •  Python for server side, native web clients •  Easily add new services (single .py file) – Use RESTful API – JSON delivery of data – Full power of Python •  Rapid prototyping #*" Browser Tangelo web service “foo” index.html index.js styles.css foo.py
  • 24. ParaViewWeb – Web Enabled • Bring 3D visualization to a web page – Targeting HPC web portal – Simple usage with basic/rigid workflow – Framework to develop 3D web applications – Must work now (no WebGL) – Support collaboration with multiple clients sharing the same visualization • The goal was NOT to – Redo another generic ParaView client #+
  • 25. Tangelo Powering ParaViewWeb • We need a web front end to – Start processes – Forward communications #$
  • 27. Visualizing Flickr Metadata •  Uses Google maps •  Flickr data in MongoDB •  Python service retrieves data using PyMongo •  D3 layer over maps –  Geolocation –  Day of the week –  Photo (mouse hover) #&"
  • 28. Enron Email Network Visualization •  enron.py retrieves emails –  Computes graph structure •  D3 force layout for viz •  Controls to: –  Slice email by time –  Change email originator –  Set number of hops •  Tool targeted at investigating social network behavior #'"
  • 29. Bitcoin Analysis •  Uses bitcoin blockchain –  Individual transactions •  Intensity histogram with transaction volume in date/amount ranges •  Detail plot with individual transactions •  Anomaly search –  Theft detection •  Study large scale behavior over time #("
  • 31. Informatics Software Stack *!" MNO" PD-3M8-Q" <.0.M8-Q6-R" M8G2C8BG" SN;T?U.L.GB08D4" E*?M-V." 6-R"WDDG" E-GX42D"WDDG" S2GD84.F" 12G4G" YF8BX0" 17.084I@-4" <I4723" N.3V-F2" ," ;.4F.R" @TNO" S./22D" ;23V2" =CD.F." KZT" 1KM? UKP@" W3.FIG8G"W/.D4-0G" E.4."W/.D4-0G" J" J"
  • 32. Digital Pathology •  MongoDB used for image tiles – Store once, using multiple times – Metadata, processing status, results – Browser-based application/interaction *#" https://slide-atlas.org/
  • 33. Arbor is an NSF-funded project to enable evolutionary biological research by making it easy for biologists to •  create, •  test, •  and visualize algorithms on the Tree of Life. Below is the evolutionary tree for Heliconia (Lobster Claw) plants coupled to a character matrix of observational data such as color, feature measurements, and range.
  • 34. Cosmology Data Management *+" Supercomputer DISC LS ST K8C5F.[23" 12GC2N22FG" Y0.C-Q20X" !"#"$%&#' K8C5F.[23" =3D54"/-BX" 12GC2N22FG" 123V50.[23" (")"*+,-' .,)/,)' (")"*+,-' !$+,0#' <.0.M8-Q6-R" 1,2'3)4-&,)' K50L-IG" Advanced User/Developer/ Scientist E.4."=34-3G8L-" KB.F.RF-"12CD]" Database Scientist Experimentalist Database
  • 35. *$" $+2!4&54644$&7"' Voronoi Tesselation FOF HaloFinder Stream Counter CosmoTools ParaView Plugins Caustics •  ANL: Salman Habib, Katrin Heitmann, Tom Peterka, Adrian Pope, Hal Finkel •  LANL: Jim Ahrens, Jon Woodring, Pat Fasel •  Kitware: George Zagaris, Berk Geveci, Casey Goodlett, Zach Mullen
  • 36. UV-CDAT for Climate Visualization •  Ultrascale Visualization and Climate Data Analysis Toolkit – Collaborative effort led by LLNL – Integrate DOE’s climate modeling/measures •  Integrates a large number of tools/libs – CDAT, VTK, R, ParaView, DV3D •  Current data sets at about 3.5 petabytes – Growing to 350 petabytes to ~3 exabytes *%"
  • 39. Applications Being Developed •  Three independent applications •  Communication handled with local sockets •  Avogadro 2: Structure editing, input generation, output viewing, and analysis •  MoleQueue: Running local and remote jobs in standalone programs, and management •  MongoChem: Storage of data, searching, entry, and annotation •  Supporting frameworks (AvogadroLibs & VTK) *("http://www.openchemistry.org/
  • 40. Use Cases for Open Chemistry •  Researchers interested in molecules –  Various sources of starting structure •  Perform studies using various codes –  Some performed locally –  Others using high-performance computing –  Different calculations produce different data •  How do these results get stored, analyzed? –  How can previous work be indexed, reused? +)"
  • 41. MongoChem Overview •  A desktop cheminformatics tool – Chemical data exploration and analysis – Interactive, editable, and searchable database •  Leverages several open-source projects – Qt, VTK, MongoDB, Avogadro 2, Open Babel •  Designed to look at many molecules •  Spots patterns, outliers; runs many jobs •  Scales to studies with ~3 million structures
  • 42. Architecture Overview •  Native, cross-platform C++ application built with Qt and Avogadro 2 •  Stores chemical data in a NoSQL MongoDB database •  Uses VTK for 2D and 3D dataset visualization +#"
  • 43. Moving MongoChem to the Web •  Increasingly important to share data •  MongoDB not suitable for web directly – Developing RESTful APIs – Building on VTKWeb and Tangelo – Can do more processing close to the data •  Can we develop a platform for chemists? – Could this address materials and other areas? – Deposition of data, curation, client-server processing, web interface and APIs +*"
  • 44. VTKWeb, Tangelo and MongoChem •  Uses VTK’s web architecture •  Performs interactive 3D rendering •  Runs in any modern web browser •  Same MongoDB server as MongoChem •  Moves more to the client JavaScript code •  Using a simple, Python-based server – Easy to add new APIs – Easy to deploy/integrate into other solutions ++"
  • 46. Why MongoDB? •  SQL vs NoSQL approaches •  MongoDB is implemented in C++ – Scales well by adding extra shards (nodes) – Core constructs written in C++ – Access to JavaScript in map-reduce – Memory-mapped database files – GridFS for storing large files – Clients in many languages – C, C++, Python – Large, established open-source project +%"
  • 47. JSON, BSON and NoSQL •  JSON: JavaScript Object Notation •  BSON: Binary JSON – Binary-encoded serialization of JSON-like documents •  MongoDB stores BSON documents – Collections are memory-mapped BSON – Clients work directly with BSON on-the-wire •  BSON written by client can be used by server •  Very little overhead reading/writing documents +&"
  • 48. Nature of Data •  Many documents for molecules –  Individual results are usually MBs –  Small molecules, electronic structure, MD, etc. •  Materials tend to be different –  Less documents, larger results –  Less existing identifiers/search techniques •  Institutions maintain big disks –  Move to referencing data, client-server, etc. +'"
  • 50. Clean Energy Project: Introduction •  Searching for organic photovoltaics – IBM World Community Grid – High-throughput, in-silico study – Partnered with experimental groups •  Synthesize most promising candidates •  Many views of the data – Simple numbers for many properties – 2D graphs and 3D chemical structures – 3D structures with quantum calculation output $)" http://cleanenergy.molecularspace.org/
  • 51. Clean Energy Project: Big Data •  Overall size and scope of the data: – 2.3 million unique molecules •  22 million conformers •  150 million DFT calculations •  400TB+ of raw output data •  80GB of metadata – Growing at just under 1TB a day – ~2.8 million unique molecules •  ~27M conformers and 185M DFT calculations •  0.5PB of raw data in the latest result set $!"
  • 52. Clean Energy Project: Open Data •  Part of the Materials Genome Initiative •  Data released under CC-BY-SA license •  Amazing opportunity for Open Chemistry – Very large dataset pushing current limits – Openly-licensed, allowing us to experiment – Opportunity to improve the state-of-the-art – Molecules fit our model •  Less than 1024 atoms •  DFT calculations with metadata extraction $#"
  • 53. Building Community •  Community around projects •  Using Kitware software process –  Ensuring quality with continuous testing –  Code contributions on the web –  Public mailing lists, bug trackers, and code review •  Promoting projects and participation –  Publication –  Conferences –  Workshops –  Social media $*" Software Repository Build, Test & Package Community Review Developers & Users
  • 54. Conclusions •  Shared frameworks needed to work with data •  Domain specific approaches are essential –  One size fits all rarely works well –  The right frameworks can be extended/customized •  Storing, sharing, publishing, and analyzing data •  Data scales increasing, client-server can help •  Semantic data is an important aspect too •  Questions? $+"