SlideShare a Scribd company logo
1 of 30
Big	
  Data	
  Challenges	
  at	
  NASA	
  

              Chris	
  A.	
  Ma4mann	
  
       Senior	
  Computer	
  Scien.st,	
  NASA	
  	
  
       Adjunct	
  Assistant	
  Professor,	
  USC	
  
     Member,	
  Apache	
  So<ware	
  Founda.on         	
  
And	
  you	
  are?	
  
                                                                   •  Senior	
  Computer	
  ScienLst	
  at	
  
                                                                      NASA	
  JPL	
  in	
  Pasadena,	
  CA	
  
                                                                      USA	
  
                                                                   •  SoNware	
  Architecture/
                                                                      Engineering	
  Prof	
  at	
  Univ.	
  of	
  
                                                                      Southern	
  California	
  	
  




                    •  Apache	
  Member	
  involved	
  in	
  
                        –  OODT	
  (VP,	
  PMC),	
  Tika	
  (VP,PMC),	
  Nutch	
  (PMC),	
  Incubator	
  (PMC),	
  
                           SIS	
  (Mentor),	
  Lucy	
  (Mentor)	
  and	
  Gora	
  (Champion),	
  MRUnit	
  
                           (Mentor),	
  Airavata	
  (Mentor)	
  
13-­‐Jun-­‐12	
                                    HADOOPSUMMIT12	
                                            2	
  
Agenda	
  
•        Big	
  Data	
  Challenges	
  and	
  where	
  we’re	
  headed	
  
•        Example	
  systems	
  at	
  NASA	
  and	
  other	
  agencies	
  
•        Apache	
  OODT:	
  a	
  primer	
  
•        Apache	
  OODT	
  +	
  Hadoop	
  
•        Where	
  we’re	
  headed	
  and	
  wrapup	
  




13-­‐Jun-­‐12	
                   HADOOPSUMMIT12	
                          3	
  
Some	
  “Big	
  Data”	
  Grand	
  Challenges	
  I’m	
  
                                  interested	
  in	
  
         •  How	
  do	
  we	
  handle	
  700	
  TB/sec	
  of	
  data	
  coming	
  off	
  the	
  wire	
  when	
  we	
  
            actually	
  have	
  to	
  keep	
  it	
  around?	
  
                  –  Required	
  by	
  the	
  Square	
  Kilometre	
  Array	
  

         •  Joe	
  scien.st	
  says	
  I’ve	
  got	
  an	
  IDL	
  or	
  Matlab	
  algorithm	
  that	
  I	
  will	
  not	
  
            change	
  and	
  I	
  need	
  to	
  run	
  it	
  on	
  10	
  years	
  of	
  data	
  from	
  the	
  Colorado	
  
            River	
  Basin	
  and	
  store	
  and	
  disseminate	
  the	
  output	
  products	
  
                  –  Required	
  by	
  the	
  Western	
  Snow	
  Hydrology	
  project	
  

         •  How	
  do	
  we	
  compare	
  petabytes	
  of	
  climate	
  model	
  output	
  data	
  in	
  a	
  
            variety	
  of	
  formats	
  (HDF,	
  NetCDF,	
  Grib,	
  etc.)	
  with	
  petabytes	
  of	
  remote	
  
            sensing	
  data	
  to	
  improve	
  climate	
  models	
  for	
  the	
  next	
  IPCC	
  assessment?	
  
                  –  Required	
  by	
  the	
  5th	
  IPCC	
  assessment	
  and	
  the	
  Earth	
  System	
  Grid	
  and	
  NASA	
  

         •  How	
  do	
  we	
  catalog	
  all	
  of	
  NASA s	
  current	
  planetary	
  science	
  data?	
  
                  –  Required	
  by	
  the	
  NASA	
  Planetary	
  Data	
  System	
  

          13-­‐Jun-­‐12	
                                      HADOOPSUMMIT12	
   2012.	
  Jet	
  Propulsion	
  Laboratory,	
  California	
  InsLtute	
  of	
  Technology.	
  US	
  
                                                                                Copyright	
                                                                         4	
  
Image	
  Credit:	
  h4p://www.jpl.nasa.gov/news/news.cfm?release=2011-­‐295	
   Government	
  Sponsorship	
  Acknowledged.	
  
The	
  NASA	
  ESDS	
  Context	
  
                         Where is open source
                         most useful?




                             Which area should produce
                             open source software?
13-­‐Jun-­‐12	
                    HADOOPSUMMIT12	
      5
Lessons	
  from	
  90’s	
  era	
  missions	
  
•  Increasing	
  data	
  volumes	
  (exponen>al	
  growth)	
  

•  Increasing	
  complexity	
  of	
  instruments	
  and	
  algorithms	
  

•  Increasing	
  availability	
  of	
  proxy/sim/ancillary	
  data	
  

•  Increasing	
  rate	
  of	
  technology	
  refresh	
  

…	
  all	
  of	
  this	
  while	
  NASA	
  Earth	
  Mission	
  funding	
  was	
  decreasing	
  

 A	
  data	
  system	
  framework	
  based	
  on	
  a	
  standard	
  architecture	
  and	
  
reusable	
  soKware	
  components	
  for	
  suppor>ng	
  all	
  future	
  missions.          	
  

13-­‐Jun-­‐12	
                          HADOOPSUMMIT12	
                                         6	
  
Where	
  do	
  Big	
  Data	
  technologies	
  
                                                  	
  
                      fit	
  into	
  this?	
  



U.S.	
  NaLonal	
  Climate	
  Assessment	
  
(pic	
  credit:	
  Dr.	
  Tom	
  Painter)	
  


                                                       SKA	
  South	
  Africa:	
  Square	
  Kilometre	
  Array	
  
                                                       (pic	
  credit:	
  Dr.	
  Jasper	
  Horrell,	
  Simon	
  Ratcliffe	
  



13-­‐Jun-­‐12	
                                 HADOOPSUMMIT12	
                                                        7	
  
13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
                            8	
  
                            Credit:	
  Cameron	
  Goodale	
  
day2_TDEM0003_10s_norx
                                                                  EVLA	
  demonstraLon	
  
                                                                     architecture	
  
                      EVLA

                                                                     day2_TDEM0003_10s_norx
                                                WWW




                                                                  Staging
                                                                   Area


                                                                                                                        products,




                                                                                                    CAS Data
                                                                                                    Services
                                                                                                                        metadata
                                                                  Crawler               Browser
                                                                                                                                 Science

                                                                                                                        system




                                                                                                    Services
                                                                                                                        status




                                                                                                     PCS
                                                                  Curator                 FM



                                                                                                                proc        Data System
                    Legend:                                                       rep             cat          status
                                                                                                                             Operator
                                   data flow
                     Apache
                     OODT         control flow                                                             W
                                                                            Cub          WM
                                                                                                        Monitor
                                      data
                      Disk Area       /met
                                                            ska-dc.jpl.nasa.gov

13-­‐Jun-­‐12	
                                                 HADOOPSUMMIT12	
                    evlascube event                        9	
  
Apache OODT
•        Entered incubation at the Apache
         Software Foundation in 2010
•        Selected as a top level Apache Software
         Foundation project in January 2011
•        Developed by a community of participants
         from many companies, universities, and
         organizations
•        Used for a diverse set of science data
         system activities in planetary science,
         earth science, radio astronomy,
         biomedicine, astrophysics, and more

                                                            http://oodt.apache.org
OODT Development & user community includes:




     13-­‐Jun-­‐12	
                   HADOOPSUMMIT12	
                              10	
  
Apache	
  OODT:	
  OSS	
  “big	
  data”	
  plaPorm	
  
            originally	
  pioneered	
  at	
  NASA	
  
•  OODT is meant to be a set of tools to help build data systems
       –  It s not meant to be turn key
       –  It attempts to exploit the boundary between bringing in capability vs.
          being overly rigid in science
                                                            Copyright	
  2012.	
  Jet	
  Propulsion	
  Laboratory,	
  California	
  
       –  Each discipline/project extends                   InsLtute	
  of	
  Technology.	
  US	
  Government	
  Sponsorship	
  
                                                            Acknowledged.	
  

•  Projects	
  that	
  are	
  deploying	
  it	
  operaLonally	
  at	
  
       –  Decadal-­‐survey	
  recommended	
  NASA	
  Earth	
  science	
  	
  missions,	
  NIH,	
  and	
  NCI,	
  
          CHLA,	
  USC,	
  South	
  African	
  SKA	
  project	
  
•  Why	
  Apache?	
  
       –  Less than 100 projects have been promoted to top level (Apache Web
          Server, Tomcat, Solr, Hadoop)
       –  Differs from other open source communities; it provides a governance
          and management structure


   13-­‐Jun-­‐12	
                                     HADOOPSUMMIT12	
                                                    11	
  
Why Apache and OODT?
•  OODT is meant to be a set of tools to
   help build data systems
           –  It s not meant to be turn key
           –  It attempts to exploit the boundary
              between bringing in capability vs.
              being overly rigid in science
           –  Each discipline/project extends

•  Apache is the elite open source
   community for software developers
           –  Less than 100 projects have been
              promoted to top level (Apache Web
              Server, Tomcat, Solr, Hadoop)
           –  Differs from other open source
              communities; it provides a
              governance and management
              structure

13-­‐Jun-­‐12	
                       HADOOPSUMMIT12	
     12	
  
Governance	
  Model+NASA=&hearts;	
  




•  NASA	
  and	
  other	
  government	
  	
  
   agencies	
  have	
  tons	
  of	
  process	
  
            –  They	
  like	
  that	
  
13-­‐Jun-­‐12	
                           HADOOPSUMMIT12	
     13	
  
OODT Framework and PCS

                                  OODT/Science                      Archive
                                   Web Tools                         Client
                                                                                             Navigation
                                                                                              Service

                           OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK


                      Catalog &
                       Archive
                       Archive         Profile
                                                 Catalog	
  &CArchive	
  
                                                  Process	
   	
   ontrol	
  	
  
                                                      Product      Query
                                                                                 Bridge to
                                                                                 External
                                                                                              Other
                                                                                             Service 1

                                                   Service	
  ((CAS)	
  
                       Service         Service        Service     Service

                                                   System	
   PCS)	
  
                       Service                                                   Services




                                                                                              Other
                                                                                             Service 2
                             Profile                   Data                   Data
                            XML Data                 System 1               System 2




                    CAS has recently become known as Process Control System
                    when applied to mission work.


13-­‐Jun-­‐12	
                                          HADOOPSUMMIT12	
                                 14	
  
Current PCS deployments
   Orbiting Carbon Observatory (OCO-2) - spectrometer instrument
               NASA ESSP Mission, launch date: TBD 2013
               PCS supporting Thermal Vacuum Tests, Ground-based instrument data processing, Space-
               based instrument data processing and Science Computing Facility
               EOM Data Volume: 61-81 TB in 3 yrs Processing Throughput: 200-300 jobs/day

    NPP Sounder PEATE - infrared sounder
                Joint NASA/NPOESS mission, launch date: October 2011
                PCS supporting Science Computing Facility (PEATE)
                EOM Data Volume: 600 TB in 5 yrs    Processing Throughput: 600 jobs/day


   QuikSCAT	
  -­‐	
  sca4erometer	
  
               NASA	
  Quick-­‐Recovery	
  Mission,	
  launch	
  date:	
  June	
  1999	
  
               PCS	
  supporLng	
  instrument	
  data	
  processing	
  and	
  science	
  analyst	
  sandbox	
  
               Originally	
  planned	
  as	
  a	
  2-­‐year	
  mission	
  
   SMAP	
  -­‐	
  high-­‐res	
  radar	
  and	
  radiometer	
  
               NASA	
  decadal	
  study	
  mission,	
  launch	
  date:	
  2014	
  
               PCS	
  supporLng	
  radar	
  instrument	
  and	
  science	
  algorithm	
  development	
  testbed	
  

13-­‐Jun-­‐12	
                                                              HADOOPSUMMIT12	
                         15	
  
Other PCS applications
     Astronomy	
  and	
  Radio	
  
                    Prototype	
  work	
  on	
  MeerKAT	
  with	
  South	
  Africans	
  and	
  KAT-­‐7	
  telescope	
  
                    Discussions	
  ongoing	
  with	
  NRAO	
  Socorro	
  (EVLA	
  and	
  ALMA)	
  


     Bioinforma>cs	
  
                    NaLonal	
  InsLtutes	
  of	
  Health	
  (NIH)	
  NaLonal	
  Cancer	
  InsLtute s	
  (NCI)	
  Early	
  DetecLon	
  
                    Research	
  Network	
  (EDRN)	
  
                    Children s	
  Hospital	
  LA	
  Virtual	
  Pediatric	
  Intensive	
  Care	
  Unit	
  (VPICU)	
  

     Earth	
  Science	
  
                    NaLonal	
  Climate	
  Assessment	
  –	
  Snow	
  Hydrology	
  in	
  the	
  Western	
  US	
  and	
  Alaska	
  
                    NaLonal	
  Climate	
  Assessment	
  –	
  Regional	
  Climate	
  Modeling	
  and	
  EvaluaLon	
  

    Technology	
  Demonstra>on	
  
                JPL s	
  AcLve	
  Mirror	
  Telescope	
  (AMT)	
  
                White	
  Sands	
  Missile	
  Range	
  
13-­‐Jun-­‐12	
                                                    HADOOPSUMMIT12	
                                                      16	
  
PCS Core Components




•  All	
  Core	
  components	
  implemented	
  as	
  web	
  services	
  
        –  XML-­‐RPC	
  used	
  to	
  communicate	
  between	
  components	
  
        –  Servers	
  implemented	
  in	
  Java	
  
        –  Clients	
  implemented	
  in	
  Java,	
  scripts,	
  Python,	
  	
  PHP	
  and	
  web-­‐apps	
  
        –  Service	
  configuraLon	
  implemented	
  in	
  ASCII	
  and	
  XML	
  files	
  	
  
  13-­‐Jun-­‐12	
                                                     HADOOPSUMMIT12	
                        17	
  
Core Capabilities
•  File	
  Manager	
  does	
  Data	
  Management	
  
       –  Tracks	
  all	
  of	
  the	
  stored	
  data,	
  files	
  &	
  metadata	
  
       –  Moves	
  data	
  to	
  appropriate	
  locaLons	
  before	
  and	
  aNer	
  iniLaLng	
  PGE	
  runs	
  and	
  from	
  staging	
  area	
  to	
  
          controlled	
  access	
  storage	
  



•  	
  Workflow	
  Manager	
  does	
  Pipeline	
  Processing	
  
       –  Automates	
  processing	
  when	
  all	
  run	
  condiLons	
  are	
  ready	
  
       –  Monitors	
  and	
  logs	
  processing	
  status	
  



•  Resource	
  Manager	
  does	
  Resource	
  Management	
  
       –  Allocates	
  processing	
  jobs	
  to	
  compuLng	
  resources	
  
       –  Monitors	
  and	
  logs	
  job	
  &	
  resource	
  status	
  
       –  Copies	
  output	
  data	
  to	
  storage	
  locaLons	
  where	
  space	
  is	
  available	
  
       –  Provides	
  the	
  means	
  to	
  monitor	
  resource	
  usage	
  


   13-­‐Jun-­‐12	
                                                         HADOOPSUMMIT12	
                                                                18	
  
File/Metadata Capabilities




13-­‐Jun-­‐12	
             HADOOPSUMMIT12	
     19	
  
Advanced Workflow Monitoring




13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
     20	
  
Resource Monitoring




13-­‐Jun-­‐12	
            HADOOPSUMMIT12	
     21	
  
How do we deploy PCS for a mission?
•         We implement the following mission-specific customizations
             –  Server Configuration
                         •    Implemented in ASCII properties files

             –  Product metadata specification
                         •    Implemented in XML policy files

             –  Processing Rules
                         •    Implemented as Java classes and/or XML policy files

             –  PGE Configuration
                         •    Implemented in XML policy files

             –  Compute Node Usage Policies
                         •    Implemented in XML policy files

•         Here s what we don t change
             –  All PCS Servers (e.g. File Manager, Workflow Manager, Resource Manager)
                         •  Core data management, pipeline process management and job scheduling/submission
                            capabilities
             –  File Catalog schema
             –  Workflow Model Repository Schema

     13-­‐Jun-­‐12	
                                              HADOOPSUMMIT12	
                            22	
  
Server and PGE Configuration




13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
     23	
  
Latest	
  Apache	
  OODT	
  release:	
  0.3	
  
  •  First	
  appearance	
  of	
  PCS	
  
              –  Core,	
  Services	
  (JAX-­‐RS)	
  
  •  Web	
  ApplicaLons	
  
              –  Balance	
  (PHP),	
  and	
  Wicket	
  (Java)-­‐based	
  apps	
  for	
  
                 file	
  management	
  and	
  workflow	
  monitoring	
  
  •  First	
  release	
  deployed	
  to	
  Maven	
  Central	
  
              –  We	
  did	
  backport	
  0.2	
  there	
  aNer	
  this	
  
              –  Over	
  60	
  issues	
  fixed	
  in	
  JIRA	
  
  •  June	
  2011:	
  recommended	
  stable	
  release	
  
13-­‐Jun-­‐12	
                              HADOOPSUMMIT12	
                              24	
  
Working	
  on:	
  0.4	
  
•  Operator	
  Interface	
  (OODT-­‐157)	
  
•  Workflow2	
  integraLon	
  (OODT-­‐215)	
  and	
  all	
  of	
  its	
  sub-­‐issues	
  
            –  Global	
  workflow	
  condiLons,	
  dynamic	
  workflows,	
  parallel/sequenLal	
  
               model,	
  new	
  workflow	
  engine,	
  etc.	
  
•  OODT	
  RADIX	
  for	
  super	
  easy	
  deployment	
  (OODT-­‐120)	
  
•  Solr	
  sync	
  with	
  File	
  Manager	
  (OODT-­‐326)	
  
•  Improvements	
  to	
  XMLPS	
  (OODT-­‐333)	
  and	
  new	
  crawler	
  acLons	
  
   (OODT-­‐33,	
  OODT-­‐34,	
  OODT-­‐35,	
  OODT-­‐36,	
  OODT-­‐37)	
  
•  CLI	
  rewrite	
  and	
  refactor	
  
•  Over	
  130	
  issues	
  currently	
  resolved	
  
•  Likely	
  to	
  come	
  before	
  end	
  of	
  Q2	
  2012	
  

13-­‐Jun-­‐12	
                              HADOOPSUMMIT12	
                                      25	
  
How	
  do	
  these	
  fit	
  together?	
  



•  Hadoop	
  HDFS	
  
            –  OODT	
  file	
  manager	
  leveraging	
  HDFS	
  for	
  virtual	
  disk	
  path,	
  replicaLon,	
  
               archiving,	
  scalability	
  
•  Hadoop	
  M/R	
  
            –  Work	
  done	
  in	
  OODT	
  branch	
  to	
  connect	
  OODT	
  Workflow	
  +	
  Resource	
  
               Mgmt	
  to	
  Hadoop	
  (pre	
  YARN)	
  
•  Hadoop	
  HIVE	
  used	
  in	
  Regional	
  Climate	
  Modeling	
  DB	
  
13-­‐Jun-­‐12	
                                     HADOOPSUMMIT12	
                                                26	
  
Where	
  are	
  we	
  headed	
  with	
  
                                                        	
  
                        OODT	
  +	
  Hadoop?    	
  
•  InvesLgate	
  and	
  integrate	
  YARN	
  
            –  Workflow	
  and	
  Resource	
  Mgmt	
  
•  Plug	
  in	
  HBase	
  as	
  File	
  Manager	
  Catalog	
  
            –  Already	
  plugged	
  in	
  HIVE	
  
            –  PotenLally	
  leverage	
  Gora?	
  
•  OODT	
  +	
  Hadoop	
  Virtual	
  Machines	
  and	
  RPMs	
  
            –  Easy	
  InstallaLon	
  leveraging	
  OODT	
  RADIX	
  
•  Remote	
  file	
  acquisiLon	
  (Push/Pull)	
  as	
  Hadoop	
  
   M/R	
  
13-­‐Jun-­‐12	
                        HADOOPSUMMIT12	
                 27	
  
Key	
  Takeaway	
  




                    Apache	
  OODT,	
  Apache	
  Hadoop,	
  other	
  big	
  data	
  
                    technologies	
  preparing	
  the	
  world	
  to	
  handle	
  all	
  of	
  
                    these	
  diverse	
  use	
  cases!	
  
                    	
  
                    Constantly	
  evolving	
  and	
  improving	
  frameworks	
  –	
  join	
  up	
  and	
  help.	
  
                    	
  
                    Free	
  and	
  open	
  source	
  from	
  Apache	
  and	
  helping	
  government	
  demonstrate	
  the	
  
                    public	
  good	
  
13-­‐Jun-­‐12	
                                          HADOOPSUMMIT12	
                                                28	
  
Apache OODT Project Contact Info
•  Learn more and track our progress at:
            –  http://oodt.apache.org
            –  WIKI: https://cwiki.apache.org/OODT/
            –  JIRA: https://issues.apache.org/jira/browse/OODT
•  Join the mailing list:
            –  dev@oodt.apache.org
•  Chat on IRC:
            –  #oodt on irc.freenode.net
•      Acknowledgements
         –  Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes, Andrew
            Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn,
            Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid
         –  Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network,
            Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA
            OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid
            Federation




     13-­‐Jun-­‐12	
                          HADOOPSUMMIT12	
                                       29	
  
Alright,	
  I ll	
  shut	
  up	
  now	
  
•  Any	
  quesLons?	
  

•  THANK	
  YOU!	
  
            –  chris.a.ma4mann@nasa.gov	
  	
  
            –  @chrisma4mann	
  on	
  Twi4er	
  




13-­‐Jun-­‐12	
                    HADOOPSUMMIT12	
             30	
  

More Related Content

What's hot

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningDatabricks
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesSlideTeam
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explainedjdhaar
 
Financial Services - New Approach to Data Management in the Digital Era
Financial Services - New Approach to Data Management in the Digital EraFinancial Services - New Approach to Data Management in the Digital Era
Financial Services - New Approach to Data Management in the Digital Eraaccenture
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learningijtsrd
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 

What's hot (20)

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation SlidesData Analytics PowerPoint Presentation Slides
Data Analytics PowerPoint Presentation Slides
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Pagerank Algorithm Explained
Pagerank Algorithm ExplainedPagerank Algorithm Explained
Pagerank Algorithm Explained
 
Financial Services - New Approach to Data Management in the Digital Era
Financial Services - New Approach to Data Management in the Digital EraFinancial Services - New Approach to Data Management in the Digital Era
Financial Services - New Approach to Data Management in the Digital Era
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learning
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Data mesh
Data meshData mesh
Data mesh
 

Similar to Big Data Challenges at NASA

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayChris Mattmann
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017Amazon Web Services
 
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundJasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundSaratoga
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyFabio Porto
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Stanford University
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayLarry Smarr
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Researchshandra_psc
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisLarry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 

Similar to Big Data Challenges at NASA (20)

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017
 
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundJasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical Divide
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Big Data Challenges at NASA

  • 1. Big  Data  Challenges  at  NASA   Chris  A.  Ma4mann   Senior  Computer  Scien.st,  NASA     Adjunct  Assistant  Professor,  USC   Member,  Apache  So<ware  Founda.on  
  • 2. And  you  are?   •  Senior  Computer  ScienLst  at   NASA  JPL  in  Pasadena,  CA   USA   •  SoNware  Architecture/ Engineering  Prof  at  Univ.  of   Southern  California     •  Apache  Member  involved  in   –  OODT  (VP,  PMC),  Tika  (VP,PMC),  Nutch  (PMC),  Incubator  (PMC),   SIS  (Mentor),  Lucy  (Mentor)  and  Gora  (Champion),  MRUnit   (Mentor),  Airavata  (Mentor)   13-­‐Jun-­‐12   HADOOPSUMMIT12   2  
  • 3. Agenda   •  Big  Data  Challenges  and  where  we’re  headed   •  Example  systems  at  NASA  and  other  agencies   •  Apache  OODT:  a  primer   •  Apache  OODT  +  Hadoop   •  Where  we’re  headed  and  wrapup   13-­‐Jun-­‐12   HADOOPSUMMIT12   3  
  • 4. Some  “Big  Data”  Grand  Challenges  I’m   interested  in   •  How  do  we  handle  700  TB/sec  of  data  coming  off  the  wire  when  we   actually  have  to  keep  it  around?   –  Required  by  the  Square  Kilometre  Array   •  Joe  scien.st  says  I’ve  got  an  IDL  or  Matlab  algorithm  that  I  will  not   change  and  I  need  to  run  it  on  10  years  of  data  from  the  Colorado   River  Basin  and  store  and  disseminate  the  output  products   –  Required  by  the  Western  Snow  Hydrology  project   •  How  do  we  compare  petabytes  of  climate  model  output  data  in  a   variety  of  formats  (HDF,  NetCDF,  Grib,  etc.)  with  petabytes  of  remote   sensing  data  to  improve  climate  models  for  the  next  IPCC  assessment?   –  Required  by  the  5th  IPCC  assessment  and  the  Earth  System  Grid  and  NASA   •  How  do  we  catalog  all  of  NASA s  current  planetary  science  data?   –  Required  by  the  NASA  Planetary  Data  System   13-­‐Jun-­‐12   HADOOPSUMMIT12   2012.  Jet  Propulsion  Laboratory,  California  InsLtute  of  Technology.  US   Copyright   4   Image  Credit:  h4p://www.jpl.nasa.gov/news/news.cfm?release=2011-­‐295   Government  Sponsorship  Acknowledged.  
  • 5. The  NASA  ESDS  Context   Where is open source most useful? Which area should produce open source software? 13-­‐Jun-­‐12   HADOOPSUMMIT12   5
  • 6. Lessons  from  90’s  era  missions   •  Increasing  data  volumes  (exponen>al  growth)   •  Increasing  complexity  of  instruments  and  algorithms   •  Increasing  availability  of  proxy/sim/ancillary  data   •  Increasing  rate  of  technology  refresh   …  all  of  this  while  NASA  Earth  Mission  funding  was  decreasing   A  data  system  framework  based  on  a  standard  architecture  and   reusable  soKware  components  for  suppor>ng  all  future  missions.   13-­‐Jun-­‐12   HADOOPSUMMIT12   6  
  • 7. Where  do  Big  Data  technologies     fit  into  this?   U.S.  NaLonal  Climate  Assessment   (pic  credit:  Dr.  Tom  Painter)   SKA  South  Africa:  Square  Kilometre  Array   (pic  credit:  Dr.  Jasper  Horrell,  Simon  Ratcliffe   13-­‐Jun-­‐12   HADOOPSUMMIT12   7  
  • 8. 13-­‐Jun-­‐12   HADOOPSUMMIT12   8   Credit:  Cameron  Goodale  
  • 9. day2_TDEM0003_10s_norx EVLA  demonstraLon   architecture   EVLA day2_TDEM0003_10s_norx WWW Staging Area products, CAS Data Services metadata Crawler Browser Science system Services status PCS Curator FM proc Data System Legend: rep cat status Operator data flow Apache OODT control flow W Cub WM Monitor data Disk Area /met ska-dc.jpl.nasa.gov 13-­‐Jun-­‐12   HADOOPSUMMIT12   evlascube event 9  
  • 10. Apache OODT •  Entered incubation at the Apache Software Foundation in 2010 •  Selected as a top level Apache Software Foundation project in January 2011 •  Developed by a community of participants from many companies, universities, and organizations •  Used for a diverse set of science data system activities in planetary science, earth science, radio astronomy, biomedicine, astrophysics, and more http://oodt.apache.org OODT Development & user community includes: 13-­‐Jun-­‐12   HADOOPSUMMIT12   10  
  • 11. Apache  OODT:  OSS  “big  data”  plaPorm   originally  pioneered  at  NASA   •  OODT is meant to be a set of tools to help build data systems –  It s not meant to be turn key –  It attempts to exploit the boundary between bringing in capability vs. being overly rigid in science Copyright  2012.  Jet  Propulsion  Laboratory,  California   –  Each discipline/project extends InsLtute  of  Technology.  US  Government  Sponsorship   Acknowledged.   •  Projects  that  are  deploying  it  operaLonally  at   –  Decadal-­‐survey  recommended  NASA  Earth  science    missions,  NIH,  and  NCI,   CHLA,  USC,  South  African  SKA  project   •  Why  Apache?   –  Less than 100 projects have been promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop) –  Differs from other open source communities; it provides a governance and management structure 13-­‐Jun-­‐12   HADOOPSUMMIT12   11  
  • 12. Why Apache and OODT? •  OODT is meant to be a set of tools to help build data systems –  It s not meant to be turn key –  It attempts to exploit the boundary between bringing in capability vs. being overly rigid in science –  Each discipline/project extends •  Apache is the elite open source community for software developers –  Less than 100 projects have been promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop) –  Differs from other open source communities; it provides a governance and management structure 13-­‐Jun-­‐12   HADOOPSUMMIT12   12  
  • 13. Governance  Model+NASA=&hearts;   •  NASA  and  other  government     agencies  have  tons  of  process   –  They  like  that   13-­‐Jun-­‐12   HADOOPSUMMIT12   13  
  • 14. OODT Framework and PCS OODT/Science Archive Web Tools Client Navigation Service OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK Catalog & Archive Archive Profile Catalog  &CArchive   Process     ontrol     Product Query Bridge to External Other Service 1 Service  ((CAS)   Service Service Service Service System   PCS)   Service Services Other Service 2 Profile Data Data XML Data System 1 System 2 CAS has recently become known as Process Control System when applied to mission work. 13-­‐Jun-­‐12   HADOOPSUMMIT12   14  
  • 15. Current PCS deployments Orbiting Carbon Observatory (OCO-2) - spectrometer instrument NASA ESSP Mission, launch date: TBD 2013 PCS supporting Thermal Vacuum Tests, Ground-based instrument data processing, Space- based instrument data processing and Science Computing Facility EOM Data Volume: 61-81 TB in 3 yrs Processing Throughput: 200-300 jobs/day NPP Sounder PEATE - infrared sounder Joint NASA/NPOESS mission, launch date: October 2011 PCS supporting Science Computing Facility (PEATE) EOM Data Volume: 600 TB in 5 yrs Processing Throughput: 600 jobs/day QuikSCAT  -­‐  sca4erometer   NASA  Quick-­‐Recovery  Mission,  launch  date:  June  1999   PCS  supporLng  instrument  data  processing  and  science  analyst  sandbox   Originally  planned  as  a  2-­‐year  mission   SMAP  -­‐  high-­‐res  radar  and  radiometer   NASA  decadal  study  mission,  launch  date:  2014   PCS  supporLng  radar  instrument  and  science  algorithm  development  testbed   13-­‐Jun-­‐12   HADOOPSUMMIT12   15  
  • 16. Other PCS applications Astronomy  and  Radio   Prototype  work  on  MeerKAT  with  South  Africans  and  KAT-­‐7  telescope   Discussions  ongoing  with  NRAO  Socorro  (EVLA  and  ALMA)   Bioinforma>cs   NaLonal  InsLtutes  of  Health  (NIH)  NaLonal  Cancer  InsLtute s  (NCI)  Early  DetecLon   Research  Network  (EDRN)   Children s  Hospital  LA  Virtual  Pediatric  Intensive  Care  Unit  (VPICU)   Earth  Science   NaLonal  Climate  Assessment  –  Snow  Hydrology  in  the  Western  US  and  Alaska   NaLonal  Climate  Assessment  –  Regional  Climate  Modeling  and  EvaluaLon   Technology  Demonstra>on   JPL s  AcLve  Mirror  Telescope  (AMT)   White  Sands  Missile  Range   13-­‐Jun-­‐12   HADOOPSUMMIT12   16  
  • 17. PCS Core Components •  All  Core  components  implemented  as  web  services   –  XML-­‐RPC  used  to  communicate  between  components   –  Servers  implemented  in  Java   –  Clients  implemented  in  Java,  scripts,  Python,    PHP  and  web-­‐apps   –  Service  configuraLon  implemented  in  ASCII  and  XML  files     13-­‐Jun-­‐12   HADOOPSUMMIT12   17  
  • 18. Core Capabilities •  File  Manager  does  Data  Management   –  Tracks  all  of  the  stored  data,  files  &  metadata   –  Moves  data  to  appropriate  locaLons  before  and  aNer  iniLaLng  PGE  runs  and  from  staging  area  to   controlled  access  storage   •   Workflow  Manager  does  Pipeline  Processing   –  Automates  processing  when  all  run  condiLons  are  ready   –  Monitors  and  logs  processing  status   •  Resource  Manager  does  Resource  Management   –  Allocates  processing  jobs  to  compuLng  resources   –  Monitors  and  logs  job  &  resource  status   –  Copies  output  data  to  storage  locaLons  where  space  is  available   –  Provides  the  means  to  monitor  resource  usage   13-­‐Jun-­‐12   HADOOPSUMMIT12   18  
  • 22. How do we deploy PCS for a mission? •  We implement the following mission-specific customizations –  Server Configuration •  Implemented in ASCII properties files –  Product metadata specification •  Implemented in XML policy files –  Processing Rules •  Implemented as Java classes and/or XML policy files –  PGE Configuration •  Implemented in XML policy files –  Compute Node Usage Policies •  Implemented in XML policy files •  Here s what we don t change –  All PCS Servers (e.g. File Manager, Workflow Manager, Resource Manager) •  Core data management, pipeline process management and job scheduling/submission capabilities –  File Catalog schema –  Workflow Model Repository Schema 13-­‐Jun-­‐12   HADOOPSUMMIT12   22  
  • 23. Server and PGE Configuration 13-­‐Jun-­‐12   HADOOPSUMMIT12   23  
  • 24. Latest  Apache  OODT  release:  0.3   •  First  appearance  of  PCS   –  Core,  Services  (JAX-­‐RS)   •  Web  ApplicaLons   –  Balance  (PHP),  and  Wicket  (Java)-­‐based  apps  for   file  management  and  workflow  monitoring   •  First  release  deployed  to  Maven  Central   –  We  did  backport  0.2  there  aNer  this   –  Over  60  issues  fixed  in  JIRA   •  June  2011:  recommended  stable  release   13-­‐Jun-­‐12   HADOOPSUMMIT12   24  
  • 25. Working  on:  0.4   •  Operator  Interface  (OODT-­‐157)   •  Workflow2  integraLon  (OODT-­‐215)  and  all  of  its  sub-­‐issues   –  Global  workflow  condiLons,  dynamic  workflows,  parallel/sequenLal   model,  new  workflow  engine,  etc.   •  OODT  RADIX  for  super  easy  deployment  (OODT-­‐120)   •  Solr  sync  with  File  Manager  (OODT-­‐326)   •  Improvements  to  XMLPS  (OODT-­‐333)  and  new  crawler  acLons   (OODT-­‐33,  OODT-­‐34,  OODT-­‐35,  OODT-­‐36,  OODT-­‐37)   •  CLI  rewrite  and  refactor   •  Over  130  issues  currently  resolved   •  Likely  to  come  before  end  of  Q2  2012   13-­‐Jun-­‐12   HADOOPSUMMIT12   25  
  • 26. How  do  these  fit  together?   •  Hadoop  HDFS   –  OODT  file  manager  leveraging  HDFS  for  virtual  disk  path,  replicaLon,   archiving,  scalability   •  Hadoop  M/R   –  Work  done  in  OODT  branch  to  connect  OODT  Workflow  +  Resource   Mgmt  to  Hadoop  (pre  YARN)   •  Hadoop  HIVE  used  in  Regional  Climate  Modeling  DB   13-­‐Jun-­‐12   HADOOPSUMMIT12   26  
  • 27. Where  are  we  headed  with     OODT  +  Hadoop?   •  InvesLgate  and  integrate  YARN   –  Workflow  and  Resource  Mgmt   •  Plug  in  HBase  as  File  Manager  Catalog   –  Already  plugged  in  HIVE   –  PotenLally  leverage  Gora?   •  OODT  +  Hadoop  Virtual  Machines  and  RPMs   –  Easy  InstallaLon  leveraging  OODT  RADIX   •  Remote  file  acquisiLon  (Push/Pull)  as  Hadoop   M/R   13-­‐Jun-­‐12   HADOOPSUMMIT12   27  
  • 28. Key  Takeaway   Apache  OODT,  Apache  Hadoop,  other  big  data   technologies  preparing  the  world  to  handle  all  of   these  diverse  use  cases!     Constantly  evolving  and  improving  frameworks  –  join  up  and  help.     Free  and  open  source  from  Apache  and  helping  government  demonstrate  the   public  good   13-­‐Jun-­‐12   HADOOPSUMMIT12   28  
  • 29. Apache OODT Project Contact Info •  Learn more and track our progress at: –  http://oodt.apache.org –  WIKI: https://cwiki.apache.org/OODT/ –  JIRA: https://issues.apache.org/jira/browse/OODT •  Join the mailing list: –  dev@oodt.apache.org •  Chat on IRC: –  #oodt on irc.freenode.net •  Acknowledgements –  Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes, Andrew Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn, Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid –  Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network, Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid Federation 13-­‐Jun-­‐12   HADOOPSUMMIT12   29  
  • 30. Alright,  I ll  shut  up  now   •  Any  quesLons?   •  THANK  YOU!   –  chris.a.ma4mann@nasa.gov     –  @chrisma4mann  on  Twi4er   13-­‐Jun-­‐12   HADOOPSUMMIT12   30