SlideShare a Scribd company logo
1 of 30
Big	
  Data	
  Challenges	
  at	
  NASA	
  

              Chris	
  A.	
  Ma4mann	
  
       Senior	
  Computer	
  Scien.st,	
  NASA	
  	
  
       Adjunct	
  Assistant	
  Professor,	
  USC	
  
     Member,	
  Apache	
  So<ware	
  Founda.on         	
  
And	
  you	
  are?	
  
                                                                   •  Senior	
  Computer	
  ScienLst	
  at	
  
                                                                      NASA	
  JPL	
  in	
  Pasadena,	
  CA	
  
                                                                      USA	
  
                                                                   •  SoNware	
  Architecture/
                                                                      Engineering	
  Prof	
  at	
  Univ.	
  of	
  
                                                                      Southern	
  California	
  	
  




                    •  Apache	
  Member	
  involved	
  in	
  
                        –  OODT	
  (VP,	
  PMC),	
  Tika	
  (VP,PMC),	
  Nutch	
  (PMC),	
  Incubator	
  (PMC),	
  
                           SIS	
  (Mentor),	
  Lucy	
  (Mentor)	
  and	
  Gora	
  (Champion),	
  MRUnit	
  
                           (Mentor),	
  Airavata	
  (Mentor)	
  
13-­‐Jun-­‐12	
                                    HADOOPSUMMIT12	
                                            2	
  
Agenda	
  
•        Big	
  Data	
  Challenges	
  and	
  where	
  we’re	
  headed	
  
•        Example	
  systems	
  at	
  NASA	
  and	
  other	
  agencies	
  
•        Apache	
  OODT:	
  a	
  primer	
  
•        Apache	
  OODT	
  +	
  Hadoop	
  
•        Where	
  we’re	
  headed	
  and	
  wrapup	
  




13-­‐Jun-­‐12	
                   HADOOPSUMMIT12	
                          3	
  
Some	
  “Big	
  Data”	
  Grand	
  Challenges	
  I’m	
  
                                  interested	
  in	
  
         •  How	
  do	
  we	
  handle	
  700	
  TB/sec	
  of	
  data	
  coming	
  off	
  the	
  wire	
  when	
  we	
  
            actually	
  have	
  to	
  keep	
  it	
  around?	
  
                  –  Required	
  by	
  the	
  Square	
  Kilometre	
  Array	
  

         •  Joe	
  scien.st	
  says	
  I’ve	
  got	
  an	
  IDL	
  or	
  Matlab	
  algorithm	
  that	
  I	
  will	
  not	
  
            change	
  and	
  I	
  need	
  to	
  run	
  it	
  on	
  10	
  years	
  of	
  data	
  from	
  the	
  Colorado	
  
            River	
  Basin	
  and	
  store	
  and	
  disseminate	
  the	
  output	
  products	
  
                  –  Required	
  by	
  the	
  Western	
  Snow	
  Hydrology	
  project	
  

         •  How	
  do	
  we	
  compare	
  petabytes	
  of	
  climate	
  model	
  output	
  data	
  in	
  a	
  
            variety	
  of	
  formats	
  (HDF,	
  NetCDF,	
  Grib,	
  etc.)	
  with	
  petabytes	
  of	
  remote	
  
            sensing	
  data	
  to	
  improve	
  climate	
  models	
  for	
  the	
  next	
  IPCC	
  assessment?	
  
                  –  Required	
  by	
  the	
  5th	
  IPCC	
  assessment	
  and	
  the	
  Earth	
  System	
  Grid	
  and	
  NASA	
  

         •  How	
  do	
  we	
  catalog	
  all	
  of	
  NASA s	
  current	
  planetary	
  science	
  data?	
  
                  –  Required	
  by	
  the	
  NASA	
  Planetary	
  Data	
  System	
  

          13-­‐Jun-­‐12	
                                      HADOOPSUMMIT12	
   2012.	
  Jet	
  Propulsion	
  Laboratory,	
  California	
  InsLtute	
  of	
  Technology.	
  US	
  
                                                                                Copyright	
                                                                         4	
  
Image	
  Credit:	
  h4p://www.jpl.nasa.gov/news/news.cfm?release=2011-­‐295	
   Government	
  Sponsorship	
  Acknowledged.	
  
The	
  NASA	
  ESDS	
  Context	
  
                         Where is open source
                         most useful?




                             Which area should produce
                             open source software?
13-­‐Jun-­‐12	
                    HADOOPSUMMIT12	
      5
Lessons	
  from	
  90’s	
  era	
  missions	
  
•  Increasing	
  data	
  volumes	
  (exponen>al	
  growth)	
  

•  Increasing	
  complexity	
  of	
  instruments	
  and	
  algorithms	
  

•  Increasing	
  availability	
  of	
  proxy/sim/ancillary	
  data	
  

•  Increasing	
  rate	
  of	
  technology	
  refresh	
  

…	
  all	
  of	
  this	
  while	
  NASA	
  Earth	
  Mission	
  funding	
  was	
  decreasing	
  

 A	
  data	
  system	
  framework	
  based	
  on	
  a	
  standard	
  architecture	
  and	
  
reusable	
  soKware	
  components	
  for	
  suppor>ng	
  all	
  future	
  missions.          	
  

13-­‐Jun-­‐12	
                          HADOOPSUMMIT12	
                                         6	
  
Where	
  do	
  Big	
  Data	
  technologies	
  
                                                  	
  
                      fit	
  into	
  this?	
  



U.S.	
  NaLonal	
  Climate	
  Assessment	
  
(pic	
  credit:	
  Dr.	
  Tom	
  Painter)	
  


                                                       SKA	
  South	
  Africa:	
  Square	
  Kilometre	
  Array	
  
                                                       (pic	
  credit:	
  Dr.	
  Jasper	
  Horrell,	
  Simon	
  Ratcliffe	
  



13-­‐Jun-­‐12	
                                 HADOOPSUMMIT12	
                                                        7	
  
13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
                            8	
  
                            Credit:	
  Cameron	
  Goodale	
  
day2_TDEM0003_10s_norx
                                                                  EVLA	
  demonstraLon	
  
                                                                     architecture	
  
                      EVLA

                                                                     day2_TDEM0003_10s_norx
                                                WWW




                                                                  Staging
                                                                   Area


                                                                                                                        products,




                                                                                                    CAS Data
                                                                                                    Services
                                                                                                                        metadata
                                                                  Crawler               Browser
                                                                                                                                 Science

                                                                                                                        system




                                                                                                    Services
                                                                                                                        status




                                                                                                     PCS
                                                                  Curator                 FM



                                                                                                                proc        Data System
                    Legend:                                                       rep             cat          status
                                                                                                                             Operator
                                   data flow
                     Apache
                     OODT         control flow                                                             W
                                                                            Cub          WM
                                                                                                        Monitor
                                      data
                      Disk Area       /met
                                                            ska-dc.jpl.nasa.gov

13-­‐Jun-­‐12	
                                                 HADOOPSUMMIT12	
                    evlascube event                        9	
  
Apache OODT
•        Entered incubation at the Apache
         Software Foundation in 2010
•        Selected as a top level Apache Software
         Foundation project in January 2011
•        Developed by a community of participants
         from many companies, universities, and
         organizations
•        Used for a diverse set of science data
         system activities in planetary science,
         earth science, radio astronomy,
         biomedicine, astrophysics, and more

                                                            http://oodt.apache.org
OODT Development & user community includes:




     13-­‐Jun-­‐12	
                   HADOOPSUMMIT12	
                              10	
  
Apache	
  OODT:	
  OSS	
  “big	
  data”	
  plaPorm	
  
            originally	
  pioneered	
  at	
  NASA	
  
•  OODT is meant to be a set of tools to help build data systems
       –  It s not meant to be turn key
       –  It attempts to exploit the boundary between bringing in capability vs.
          being overly rigid in science
                                                            Copyright	
  2012.	
  Jet	
  Propulsion	
  Laboratory,	
  California	
  
       –  Each discipline/project extends                   InsLtute	
  of	
  Technology.	
  US	
  Government	
  Sponsorship	
  
                                                            Acknowledged.	
  

•  Projects	
  that	
  are	
  deploying	
  it	
  operaLonally	
  at	
  
       –  Decadal-­‐survey	
  recommended	
  NASA	
  Earth	
  science	
  	
  missions,	
  NIH,	
  and	
  NCI,	
  
          CHLA,	
  USC,	
  South	
  African	
  SKA	
  project	
  
•  Why	
  Apache?	
  
       –  Less than 100 projects have been promoted to top level (Apache Web
          Server, Tomcat, Solr, Hadoop)
       –  Differs from other open source communities; it provides a governance
          and management structure


   13-­‐Jun-­‐12	
                                     HADOOPSUMMIT12	
                                                    11	
  
Why Apache and OODT?
•  OODT is meant to be a set of tools to
   help build data systems
           –  It s not meant to be turn key
           –  It attempts to exploit the boundary
              between bringing in capability vs.
              being overly rigid in science
           –  Each discipline/project extends

•  Apache is the elite open source
   community for software developers
           –  Less than 100 projects have been
              promoted to top level (Apache Web
              Server, Tomcat, Solr, Hadoop)
           –  Differs from other open source
              communities; it provides a
              governance and management
              structure

13-­‐Jun-­‐12	
                       HADOOPSUMMIT12	
     12	
  
Governance	
  Model+NASA=&hearts;	
  




•  NASA	
  and	
  other	
  government	
  	
  
   agencies	
  have	
  tons	
  of	
  process	
  
            –  They	
  like	
  that	
  
13-­‐Jun-­‐12	
                           HADOOPSUMMIT12	
     13	
  
OODT Framework and PCS

                                  OODT/Science                      Archive
                                   Web Tools                         Client
                                                                                             Navigation
                                                                                              Service

                           OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK


                      Catalog &
                       Archive
                       Archive         Profile
                                                 Catalog	
  &CArchive	
  
                                                  Process	
   	
   ontrol	
  	
  
                                                      Product      Query
                                                                                 Bridge to
                                                                                 External
                                                                                              Other
                                                                                             Service 1

                                                   Service	
  ((CAS)	
  
                       Service         Service        Service     Service

                                                   System	
   PCS)	
  
                       Service                                                   Services




                                                                                              Other
                                                                                             Service 2
                             Profile                   Data                   Data
                            XML Data                 System 1               System 2




                    CAS has recently become known as Process Control System
                    when applied to mission work.


13-­‐Jun-­‐12	
                                          HADOOPSUMMIT12	
                                 14	
  
Current PCS deployments
   Orbiting Carbon Observatory (OCO-2) - spectrometer instrument
               NASA ESSP Mission, launch date: TBD 2013
               PCS supporting Thermal Vacuum Tests, Ground-based instrument data processing, Space-
               based instrument data processing and Science Computing Facility
               EOM Data Volume: 61-81 TB in 3 yrs Processing Throughput: 200-300 jobs/day

    NPP Sounder PEATE - infrared sounder
                Joint NASA/NPOESS mission, launch date: October 2011
                PCS supporting Science Computing Facility (PEATE)
                EOM Data Volume: 600 TB in 5 yrs    Processing Throughput: 600 jobs/day


   QuikSCAT	
  -­‐	
  sca4erometer	
  
               NASA	
  Quick-­‐Recovery	
  Mission,	
  launch	
  date:	
  June	
  1999	
  
               PCS	
  supporLng	
  instrument	
  data	
  processing	
  and	
  science	
  analyst	
  sandbox	
  
               Originally	
  planned	
  as	
  a	
  2-­‐year	
  mission	
  
   SMAP	
  -­‐	
  high-­‐res	
  radar	
  and	
  radiometer	
  
               NASA	
  decadal	
  study	
  mission,	
  launch	
  date:	
  2014	
  
               PCS	
  supporLng	
  radar	
  instrument	
  and	
  science	
  algorithm	
  development	
  testbed	
  

13-­‐Jun-­‐12	
                                                              HADOOPSUMMIT12	
                         15	
  
Other PCS applications
     Astronomy	
  and	
  Radio	
  
                    Prototype	
  work	
  on	
  MeerKAT	
  with	
  South	
  Africans	
  and	
  KAT-­‐7	
  telescope	
  
                    Discussions	
  ongoing	
  with	
  NRAO	
  Socorro	
  (EVLA	
  and	
  ALMA)	
  


     Bioinforma>cs	
  
                    NaLonal	
  InsLtutes	
  of	
  Health	
  (NIH)	
  NaLonal	
  Cancer	
  InsLtute s	
  (NCI)	
  Early	
  DetecLon	
  
                    Research	
  Network	
  (EDRN)	
  
                    Children s	
  Hospital	
  LA	
  Virtual	
  Pediatric	
  Intensive	
  Care	
  Unit	
  (VPICU)	
  

     Earth	
  Science	
  
                    NaLonal	
  Climate	
  Assessment	
  –	
  Snow	
  Hydrology	
  in	
  the	
  Western	
  US	
  and	
  Alaska	
  
                    NaLonal	
  Climate	
  Assessment	
  –	
  Regional	
  Climate	
  Modeling	
  and	
  EvaluaLon	
  

    Technology	
  Demonstra>on	
  
                JPL s	
  AcLve	
  Mirror	
  Telescope	
  (AMT)	
  
                White	
  Sands	
  Missile	
  Range	
  
13-­‐Jun-­‐12	
                                                    HADOOPSUMMIT12	
                                                      16	
  
PCS Core Components




•  All	
  Core	
  components	
  implemented	
  as	
  web	
  services	
  
        –  XML-­‐RPC	
  used	
  to	
  communicate	
  between	
  components	
  
        –  Servers	
  implemented	
  in	
  Java	
  
        –  Clients	
  implemented	
  in	
  Java,	
  scripts,	
  Python,	
  	
  PHP	
  and	
  web-­‐apps	
  
        –  Service	
  configuraLon	
  implemented	
  in	
  ASCII	
  and	
  XML	
  files	
  	
  
  13-­‐Jun-­‐12	
                                                     HADOOPSUMMIT12	
                        17	
  
Core Capabilities
•  File	
  Manager	
  does	
  Data	
  Management	
  
       –  Tracks	
  all	
  of	
  the	
  stored	
  data,	
  files	
  &	
  metadata	
  
       –  Moves	
  data	
  to	
  appropriate	
  locaLons	
  before	
  and	
  aNer	
  iniLaLng	
  PGE	
  runs	
  and	
  from	
  staging	
  area	
  to	
  
          controlled	
  access	
  storage	
  



•  	
  Workflow	
  Manager	
  does	
  Pipeline	
  Processing	
  
       –  Automates	
  processing	
  when	
  all	
  run	
  condiLons	
  are	
  ready	
  
       –  Monitors	
  and	
  logs	
  processing	
  status	
  



•  Resource	
  Manager	
  does	
  Resource	
  Management	
  
       –  Allocates	
  processing	
  jobs	
  to	
  compuLng	
  resources	
  
       –  Monitors	
  and	
  logs	
  job	
  &	
  resource	
  status	
  
       –  Copies	
  output	
  data	
  to	
  storage	
  locaLons	
  where	
  space	
  is	
  available	
  
       –  Provides	
  the	
  means	
  to	
  monitor	
  resource	
  usage	
  


   13-­‐Jun-­‐12	
                                                         HADOOPSUMMIT12	
                                                                18	
  
File/Metadata Capabilities




13-­‐Jun-­‐12	
             HADOOPSUMMIT12	
     19	
  
Advanced Workflow Monitoring




13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
     20	
  
Resource Monitoring




13-­‐Jun-­‐12	
            HADOOPSUMMIT12	
     21	
  
How do we deploy PCS for a mission?
•         We implement the following mission-specific customizations
             –  Server Configuration
                         •    Implemented in ASCII properties files

             –  Product metadata specification
                         •    Implemented in XML policy files

             –  Processing Rules
                         •    Implemented as Java classes and/or XML policy files

             –  PGE Configuration
                         •    Implemented in XML policy files

             –  Compute Node Usage Policies
                         •    Implemented in XML policy files

•         Here s what we don t change
             –  All PCS Servers (e.g. File Manager, Workflow Manager, Resource Manager)
                         •  Core data management, pipeline process management and job scheduling/submission
                            capabilities
             –  File Catalog schema
             –  Workflow Model Repository Schema

     13-­‐Jun-­‐12	
                                              HADOOPSUMMIT12	
                            22	
  
Server and PGE Configuration




13-­‐Jun-­‐12	
     HADOOPSUMMIT12	
     23	
  
Latest	
  Apache	
  OODT	
  release:	
  0.3	
  
  •  First	
  appearance	
  of	
  PCS	
  
              –  Core,	
  Services	
  (JAX-­‐RS)	
  
  •  Web	
  ApplicaLons	
  
              –  Balance	
  (PHP),	
  and	
  Wicket	
  (Java)-­‐based	
  apps	
  for	
  
                 file	
  management	
  and	
  workflow	
  monitoring	
  
  •  First	
  release	
  deployed	
  to	
  Maven	
  Central	
  
              –  We	
  did	
  backport	
  0.2	
  there	
  aNer	
  this	
  
              –  Over	
  60	
  issues	
  fixed	
  in	
  JIRA	
  
  •  June	
  2011:	
  recommended	
  stable	
  release	
  
13-­‐Jun-­‐12	
                              HADOOPSUMMIT12	
                              24	
  
Working	
  on:	
  0.4	
  
•  Operator	
  Interface	
  (OODT-­‐157)	
  
•  Workflow2	
  integraLon	
  (OODT-­‐215)	
  and	
  all	
  of	
  its	
  sub-­‐issues	
  
            –  Global	
  workflow	
  condiLons,	
  dynamic	
  workflows,	
  parallel/sequenLal	
  
               model,	
  new	
  workflow	
  engine,	
  etc.	
  
•  OODT	
  RADIX	
  for	
  super	
  easy	
  deployment	
  (OODT-­‐120)	
  
•  Solr	
  sync	
  with	
  File	
  Manager	
  (OODT-­‐326)	
  
•  Improvements	
  to	
  XMLPS	
  (OODT-­‐333)	
  and	
  new	
  crawler	
  acLons	
  
   (OODT-­‐33,	
  OODT-­‐34,	
  OODT-­‐35,	
  OODT-­‐36,	
  OODT-­‐37)	
  
•  CLI	
  rewrite	
  and	
  refactor	
  
•  Over	
  130	
  issues	
  currently	
  resolved	
  
•  Likely	
  to	
  come	
  before	
  end	
  of	
  Q2	
  2012	
  

13-­‐Jun-­‐12	
                              HADOOPSUMMIT12	
                                      25	
  
How	
  do	
  these	
  fit	
  together?	
  



•  Hadoop	
  HDFS	
  
            –  OODT	
  file	
  manager	
  leveraging	
  HDFS	
  for	
  virtual	
  disk	
  path,	
  replicaLon,	
  
               archiving,	
  scalability	
  
•  Hadoop	
  M/R	
  
            –  Work	
  done	
  in	
  OODT	
  branch	
  to	
  connect	
  OODT	
  Workflow	
  +	
  Resource	
  
               Mgmt	
  to	
  Hadoop	
  (pre	
  YARN)	
  
•  Hadoop	
  HIVE	
  used	
  in	
  Regional	
  Climate	
  Modeling	
  DB	
  
13-­‐Jun-­‐12	
                                     HADOOPSUMMIT12	
                                                26	
  
Where	
  are	
  we	
  headed	
  with	
  
                                                        	
  
                        OODT	
  +	
  Hadoop?    	
  
•  InvesLgate	
  and	
  integrate	
  YARN	
  
            –  Workflow	
  and	
  Resource	
  Mgmt	
  
•  Plug	
  in	
  HBase	
  as	
  File	
  Manager	
  Catalog	
  
            –  Already	
  plugged	
  in	
  HIVE	
  
            –  PotenLally	
  leverage	
  Gora?	
  
•  OODT	
  +	
  Hadoop	
  Virtual	
  Machines	
  and	
  RPMs	
  
            –  Easy	
  InstallaLon	
  leveraging	
  OODT	
  RADIX	
  
•  Remote	
  file	
  acquisiLon	
  (Push/Pull)	
  as	
  Hadoop	
  
   M/R	
  
13-­‐Jun-­‐12	
                        HADOOPSUMMIT12	
                 27	
  
Key	
  Takeaway	
  




                    Apache	
  OODT,	
  Apache	
  Hadoop,	
  other	
  big	
  data	
  
                    technologies	
  preparing	
  the	
  world	
  to	
  handle	
  all	
  of	
  
                    these	
  diverse	
  use	
  cases!	
  
                    	
  
                    Constantly	
  evolving	
  and	
  improving	
  frameworks	
  –	
  join	
  up	
  and	
  help.	
  
                    	
  
                    Free	
  and	
  open	
  source	
  from	
  Apache	
  and	
  helping	
  government	
  demonstrate	
  the	
  
                    public	
  good	
  
13-­‐Jun-­‐12	
                                          HADOOPSUMMIT12	
                                                28	
  
Apache OODT Project Contact Info
•  Learn more and track our progress at:
            –  http://oodt.apache.org
            –  WIKI: https://cwiki.apache.org/OODT/
            –  JIRA: https://issues.apache.org/jira/browse/OODT
•  Join the mailing list:
            –  dev@oodt.apache.org
•  Chat on IRC:
            –  #oodt on irc.freenode.net
•      Acknowledgements
         –  Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes, Andrew
            Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn,
            Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid
         –  Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network,
            Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA
            OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid
            Federation




     13-­‐Jun-­‐12	
                          HADOOPSUMMIT12	
                                       29	
  
Alright,	
  I ll	
  shut	
  up	
  now	
  
•  Any	
  quesLons?	
  

•  THANK	
  YOU!	
  
            –  chris.a.ma4mann@nasa.gov	
  	
  
            –  @chrisma4mann	
  on	
  Twi4er	
  




13-­‐Jun-­‐12	
                    HADOOPSUMMIT12	
             30	
  

More Related Content

What's hot

Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Big Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityBig Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityData Science Thailand
 
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Element22
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Altis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis Consulting
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality ManagementAhmed Alorage
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Requirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationRequirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationVicki McCracken
 
How to set up an ai center of excellence
How to set up an ai center of excellenceHow to set up an ai center of excellence
How to set up an ai center of excellenceShranik Jain
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL DATAVERSITY
 
Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Amazon Web Services
 

What's hot (20)

Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Big Data Analytics to Enhance Security
Big Data Analytics to Enhance SecurityBig Data Analytics to Enhance Security
Big Data Analytics to Enhance Security
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
Introduction to DCAM, the Data Management Capability Assessment Model - Editi...
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Altis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis: AWS Snowflake Practice
Altis: AWS Snowflake Practice
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
cognos BI10.pptx
cognos BI10.pptxcognos BI10.pptx
cognos BI10.pptx
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Data Visualization: Sales forecasting
Data Visualization: Sales forecastingData Visualization: Sales forecasting
Data Visualization: Sales forecasting
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Requirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationRequirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - Presentation
 
How to set up an ai center of excellence
How to set up an ai center of excellenceHow to set up an ai center of excellence
How to set up an ai center of excellence
 
Data Modeling and Relational to NoSQL
 Data Modeling and Relational to NoSQL  Data Modeling and Relational to NoSQL
Data Modeling and Relational to NoSQL
 
Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...Rapid Innovation: The Business Case for Modern Application Development (SRV20...
Rapid Innovation: The Business Case for Modern Application Development (SRV20...
 

Similar to Big Data Challenges at NASA

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayChris Mattmann
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017Amazon Web Services
 
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundJasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundSaratoga
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyFabio Porto
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Stanford University
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...Larry Smarr
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayLarry Smarr
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Researchshandra_psc
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisLarry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideCybera Inc.
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 

Similar to Big Data Challenges at NASA (20)

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
 
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017
 
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the GroundJasper Horrell - SKA and Big Data: Up in Space and on the Ground
Jasper Horrell - SKA and Big Data: Up in Space and on the Ground
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8Space Evaders Hacking for Diplomacy week 8
Space Evaders Hacking for Diplomacy week 8
 
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Science Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical DivideScience Engagement: A Non-Technical Approach to the Technical Divide
Science Engagement: A Non-Technical Approach to the Technical Divide
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Big Data Challenges at NASA

  • 1. Big  Data  Challenges  at  NASA   Chris  A.  Ma4mann   Senior  Computer  Scien.st,  NASA     Adjunct  Assistant  Professor,  USC   Member,  Apache  So<ware  Founda.on  
  • 2. And  you  are?   •  Senior  Computer  ScienLst  at   NASA  JPL  in  Pasadena,  CA   USA   •  SoNware  Architecture/ Engineering  Prof  at  Univ.  of   Southern  California     •  Apache  Member  involved  in   –  OODT  (VP,  PMC),  Tika  (VP,PMC),  Nutch  (PMC),  Incubator  (PMC),   SIS  (Mentor),  Lucy  (Mentor)  and  Gora  (Champion),  MRUnit   (Mentor),  Airavata  (Mentor)   13-­‐Jun-­‐12   HADOOPSUMMIT12   2  
  • 3. Agenda   •  Big  Data  Challenges  and  where  we’re  headed   •  Example  systems  at  NASA  and  other  agencies   •  Apache  OODT:  a  primer   •  Apache  OODT  +  Hadoop   •  Where  we’re  headed  and  wrapup   13-­‐Jun-­‐12   HADOOPSUMMIT12   3  
  • 4. Some  “Big  Data”  Grand  Challenges  I’m   interested  in   •  How  do  we  handle  700  TB/sec  of  data  coming  off  the  wire  when  we   actually  have  to  keep  it  around?   –  Required  by  the  Square  Kilometre  Array   •  Joe  scien.st  says  I’ve  got  an  IDL  or  Matlab  algorithm  that  I  will  not   change  and  I  need  to  run  it  on  10  years  of  data  from  the  Colorado   River  Basin  and  store  and  disseminate  the  output  products   –  Required  by  the  Western  Snow  Hydrology  project   •  How  do  we  compare  petabytes  of  climate  model  output  data  in  a   variety  of  formats  (HDF,  NetCDF,  Grib,  etc.)  with  petabytes  of  remote   sensing  data  to  improve  climate  models  for  the  next  IPCC  assessment?   –  Required  by  the  5th  IPCC  assessment  and  the  Earth  System  Grid  and  NASA   •  How  do  we  catalog  all  of  NASA s  current  planetary  science  data?   –  Required  by  the  NASA  Planetary  Data  System   13-­‐Jun-­‐12   HADOOPSUMMIT12   2012.  Jet  Propulsion  Laboratory,  California  InsLtute  of  Technology.  US   Copyright   4   Image  Credit:  h4p://www.jpl.nasa.gov/news/news.cfm?release=2011-­‐295   Government  Sponsorship  Acknowledged.  
  • 5. The  NASA  ESDS  Context   Where is open source most useful? Which area should produce open source software? 13-­‐Jun-­‐12   HADOOPSUMMIT12   5
  • 6. Lessons  from  90’s  era  missions   •  Increasing  data  volumes  (exponen>al  growth)   •  Increasing  complexity  of  instruments  and  algorithms   •  Increasing  availability  of  proxy/sim/ancillary  data   •  Increasing  rate  of  technology  refresh   …  all  of  this  while  NASA  Earth  Mission  funding  was  decreasing   A  data  system  framework  based  on  a  standard  architecture  and   reusable  soKware  components  for  suppor>ng  all  future  missions.   13-­‐Jun-­‐12   HADOOPSUMMIT12   6  
  • 7. Where  do  Big  Data  technologies     fit  into  this?   U.S.  NaLonal  Climate  Assessment   (pic  credit:  Dr.  Tom  Painter)   SKA  South  Africa:  Square  Kilometre  Array   (pic  credit:  Dr.  Jasper  Horrell,  Simon  Ratcliffe   13-­‐Jun-­‐12   HADOOPSUMMIT12   7  
  • 8. 13-­‐Jun-­‐12   HADOOPSUMMIT12   8   Credit:  Cameron  Goodale  
  • 9. day2_TDEM0003_10s_norx EVLA  demonstraLon   architecture   EVLA day2_TDEM0003_10s_norx WWW Staging Area products, CAS Data Services metadata Crawler Browser Science system Services status PCS Curator FM proc Data System Legend: rep cat status Operator data flow Apache OODT control flow W Cub WM Monitor data Disk Area /met ska-dc.jpl.nasa.gov 13-­‐Jun-­‐12   HADOOPSUMMIT12   evlascube event 9  
  • 10. Apache OODT •  Entered incubation at the Apache Software Foundation in 2010 •  Selected as a top level Apache Software Foundation project in January 2011 •  Developed by a community of participants from many companies, universities, and organizations •  Used for a diverse set of science data system activities in planetary science, earth science, radio astronomy, biomedicine, astrophysics, and more http://oodt.apache.org OODT Development & user community includes: 13-­‐Jun-­‐12   HADOOPSUMMIT12   10  
  • 11. Apache  OODT:  OSS  “big  data”  plaPorm   originally  pioneered  at  NASA   •  OODT is meant to be a set of tools to help build data systems –  It s not meant to be turn key –  It attempts to exploit the boundary between bringing in capability vs. being overly rigid in science Copyright  2012.  Jet  Propulsion  Laboratory,  California   –  Each discipline/project extends InsLtute  of  Technology.  US  Government  Sponsorship   Acknowledged.   •  Projects  that  are  deploying  it  operaLonally  at   –  Decadal-­‐survey  recommended  NASA  Earth  science    missions,  NIH,  and  NCI,   CHLA,  USC,  South  African  SKA  project   •  Why  Apache?   –  Less than 100 projects have been promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop) –  Differs from other open source communities; it provides a governance and management structure 13-­‐Jun-­‐12   HADOOPSUMMIT12   11  
  • 12. Why Apache and OODT? •  OODT is meant to be a set of tools to help build data systems –  It s not meant to be turn key –  It attempts to exploit the boundary between bringing in capability vs. being overly rigid in science –  Each discipline/project extends •  Apache is the elite open source community for software developers –  Less than 100 projects have been promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop) –  Differs from other open source communities; it provides a governance and management structure 13-­‐Jun-­‐12   HADOOPSUMMIT12   12  
  • 13. Governance  Model+NASA=&hearts;   •  NASA  and  other  government     agencies  have  tons  of  process   –  They  like  that   13-­‐Jun-­‐12   HADOOPSUMMIT12   13  
  • 14. OODT Framework and PCS OODT/Science Archive Web Tools Client Navigation Service OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK Catalog & Archive Archive Profile Catalog  &CArchive   Process     ontrol     Product Query Bridge to External Other Service 1 Service  ((CAS)   Service Service Service Service System   PCS)   Service Services Other Service 2 Profile Data Data XML Data System 1 System 2 CAS has recently become known as Process Control System when applied to mission work. 13-­‐Jun-­‐12   HADOOPSUMMIT12   14  
  • 15. Current PCS deployments Orbiting Carbon Observatory (OCO-2) - spectrometer instrument NASA ESSP Mission, launch date: TBD 2013 PCS supporting Thermal Vacuum Tests, Ground-based instrument data processing, Space- based instrument data processing and Science Computing Facility EOM Data Volume: 61-81 TB in 3 yrs Processing Throughput: 200-300 jobs/day NPP Sounder PEATE - infrared sounder Joint NASA/NPOESS mission, launch date: October 2011 PCS supporting Science Computing Facility (PEATE) EOM Data Volume: 600 TB in 5 yrs Processing Throughput: 600 jobs/day QuikSCAT  -­‐  sca4erometer   NASA  Quick-­‐Recovery  Mission,  launch  date:  June  1999   PCS  supporLng  instrument  data  processing  and  science  analyst  sandbox   Originally  planned  as  a  2-­‐year  mission   SMAP  -­‐  high-­‐res  radar  and  radiometer   NASA  decadal  study  mission,  launch  date:  2014   PCS  supporLng  radar  instrument  and  science  algorithm  development  testbed   13-­‐Jun-­‐12   HADOOPSUMMIT12   15  
  • 16. Other PCS applications Astronomy  and  Radio   Prototype  work  on  MeerKAT  with  South  Africans  and  KAT-­‐7  telescope   Discussions  ongoing  with  NRAO  Socorro  (EVLA  and  ALMA)   Bioinforma>cs   NaLonal  InsLtutes  of  Health  (NIH)  NaLonal  Cancer  InsLtute s  (NCI)  Early  DetecLon   Research  Network  (EDRN)   Children s  Hospital  LA  Virtual  Pediatric  Intensive  Care  Unit  (VPICU)   Earth  Science   NaLonal  Climate  Assessment  –  Snow  Hydrology  in  the  Western  US  and  Alaska   NaLonal  Climate  Assessment  –  Regional  Climate  Modeling  and  EvaluaLon   Technology  Demonstra>on   JPL s  AcLve  Mirror  Telescope  (AMT)   White  Sands  Missile  Range   13-­‐Jun-­‐12   HADOOPSUMMIT12   16  
  • 17. PCS Core Components •  All  Core  components  implemented  as  web  services   –  XML-­‐RPC  used  to  communicate  between  components   –  Servers  implemented  in  Java   –  Clients  implemented  in  Java,  scripts,  Python,    PHP  and  web-­‐apps   –  Service  configuraLon  implemented  in  ASCII  and  XML  files     13-­‐Jun-­‐12   HADOOPSUMMIT12   17  
  • 18. Core Capabilities •  File  Manager  does  Data  Management   –  Tracks  all  of  the  stored  data,  files  &  metadata   –  Moves  data  to  appropriate  locaLons  before  and  aNer  iniLaLng  PGE  runs  and  from  staging  area  to   controlled  access  storage   •   Workflow  Manager  does  Pipeline  Processing   –  Automates  processing  when  all  run  condiLons  are  ready   –  Monitors  and  logs  processing  status   •  Resource  Manager  does  Resource  Management   –  Allocates  processing  jobs  to  compuLng  resources   –  Monitors  and  logs  job  &  resource  status   –  Copies  output  data  to  storage  locaLons  where  space  is  available   –  Provides  the  means  to  monitor  resource  usage   13-­‐Jun-­‐12   HADOOPSUMMIT12   18  
  • 22. How do we deploy PCS for a mission? •  We implement the following mission-specific customizations –  Server Configuration •  Implemented in ASCII properties files –  Product metadata specification •  Implemented in XML policy files –  Processing Rules •  Implemented as Java classes and/or XML policy files –  PGE Configuration •  Implemented in XML policy files –  Compute Node Usage Policies •  Implemented in XML policy files •  Here s what we don t change –  All PCS Servers (e.g. File Manager, Workflow Manager, Resource Manager) •  Core data management, pipeline process management and job scheduling/submission capabilities –  File Catalog schema –  Workflow Model Repository Schema 13-­‐Jun-­‐12   HADOOPSUMMIT12   22  
  • 23. Server and PGE Configuration 13-­‐Jun-­‐12   HADOOPSUMMIT12   23  
  • 24. Latest  Apache  OODT  release:  0.3   •  First  appearance  of  PCS   –  Core,  Services  (JAX-­‐RS)   •  Web  ApplicaLons   –  Balance  (PHP),  and  Wicket  (Java)-­‐based  apps  for   file  management  and  workflow  monitoring   •  First  release  deployed  to  Maven  Central   –  We  did  backport  0.2  there  aNer  this   –  Over  60  issues  fixed  in  JIRA   •  June  2011:  recommended  stable  release   13-­‐Jun-­‐12   HADOOPSUMMIT12   24  
  • 25. Working  on:  0.4   •  Operator  Interface  (OODT-­‐157)   •  Workflow2  integraLon  (OODT-­‐215)  and  all  of  its  sub-­‐issues   –  Global  workflow  condiLons,  dynamic  workflows,  parallel/sequenLal   model,  new  workflow  engine,  etc.   •  OODT  RADIX  for  super  easy  deployment  (OODT-­‐120)   •  Solr  sync  with  File  Manager  (OODT-­‐326)   •  Improvements  to  XMLPS  (OODT-­‐333)  and  new  crawler  acLons   (OODT-­‐33,  OODT-­‐34,  OODT-­‐35,  OODT-­‐36,  OODT-­‐37)   •  CLI  rewrite  and  refactor   •  Over  130  issues  currently  resolved   •  Likely  to  come  before  end  of  Q2  2012   13-­‐Jun-­‐12   HADOOPSUMMIT12   25  
  • 26. How  do  these  fit  together?   •  Hadoop  HDFS   –  OODT  file  manager  leveraging  HDFS  for  virtual  disk  path,  replicaLon,   archiving,  scalability   •  Hadoop  M/R   –  Work  done  in  OODT  branch  to  connect  OODT  Workflow  +  Resource   Mgmt  to  Hadoop  (pre  YARN)   •  Hadoop  HIVE  used  in  Regional  Climate  Modeling  DB   13-­‐Jun-­‐12   HADOOPSUMMIT12   26  
  • 27. Where  are  we  headed  with     OODT  +  Hadoop?   •  InvesLgate  and  integrate  YARN   –  Workflow  and  Resource  Mgmt   •  Plug  in  HBase  as  File  Manager  Catalog   –  Already  plugged  in  HIVE   –  PotenLally  leverage  Gora?   •  OODT  +  Hadoop  Virtual  Machines  and  RPMs   –  Easy  InstallaLon  leveraging  OODT  RADIX   •  Remote  file  acquisiLon  (Push/Pull)  as  Hadoop   M/R   13-­‐Jun-­‐12   HADOOPSUMMIT12   27  
  • 28. Key  Takeaway   Apache  OODT,  Apache  Hadoop,  other  big  data   technologies  preparing  the  world  to  handle  all  of   these  diverse  use  cases!     Constantly  evolving  and  improving  frameworks  –  join  up  and  help.     Free  and  open  source  from  Apache  and  helping  government  demonstrate  the   public  good   13-­‐Jun-­‐12   HADOOPSUMMIT12   28  
  • 29. Apache OODT Project Contact Info •  Learn more and track our progress at: –  http://oodt.apache.org –  WIKI: https://cwiki.apache.org/OODT/ –  JIRA: https://issues.apache.org/jira/browse/OODT •  Join the mailing list: –  dev@oodt.apache.org •  Chat on IRC: –  #oodt on irc.freenode.net •  Acknowledgements –  Key Members of the OODT teams: Chris Mattmann, Daniel J. Crichton, Steve Hughes, Andrew Hart, Sean Kelly, Sean Hardman, Paul Ramirez, David Woollard, Brian Foster, Dana Freeborn, Emily Law, Mike Cayanan, Luca Cinquini, Heather Kincaid –  Projects, Sponsors, Collaborators: Planetary Data System, Early Detection Research Network, Climate Data Exchange, Virtual Pediatric Intensive Care Unit, NASA SMAP Mission, NASA OCO-2 Mission, NASA NPP Sounder Peate, NASA ACOS Mission, Earth System Grid Federation 13-­‐Jun-­‐12   HADOOPSUMMIT12   29  
  • 30. Alright,  I ll  shut  up  now   •  Any  quesLons?   •  THANK  YOU!   –  chris.a.ma4mann@nasa.gov     –  @chrisma4mann  on  Twi4er   13-­‐Jun-­‐12   HADOOPSUMMIT12   30