SlideShare a Scribd company logo
1 of 20
Download to read offline
Josh Bloom (PI)
       , Justin Higgins, Adam Morgan
“Object”
 Datastream




Transients
Classification
Pipeline

 Classify

   Database

 Broadcast
SASIR              LSST
        SDSS                     PTF / LBL                           (future)         (future)
        stripe-82                    subtraction
      archived data                    pipeline                Survey X Survey Y
                                                               (real-time survey
                                                                  telescope)        (static survey
                                                                                     repository)




                                 Transients
                                 Classification
                                 Pipeline
  Database containing                                                     Classify
                               Broadcast “sources”
        “sources”
• features for a source    • interesting or transient source
                           • include classifications                             Database
• data epochs associated   • include features, context
     with a source                                                       Broadcast
SDSS Stripe 82
        SDSS
        stripe-82
      archived data
                           •   A deep field from the Sloan Digital Sky Survey

                           •   750 Million observation epochs


Transients                 •   ~20 Million “sources” clustered from epochs

                           •   5 colors / filters, 4 years of observations
Classification              •   We used Stripe-82 for testing and development

Pipeline
  Database containing
        “sources”
• features for a source
• data epochs associated
     with a source
Palomar Transient Factory
                    •   Palomar 48” telescope

                    •   100 Mpix, 7.8 sq-deg detector

                    •   ~120s cadence : ~200MB : <100GB/night

                    •   Post subtraction: ~1M difference objects / night

                    •   Post filtering: ~10k difference objects / night
                                         ~100s transient and variable stars



 LBL
subtraction
  pipeline
                T       PTF consortium
                                                           PAIRITEL 1.3m


                C
                P                           Palomar 60”           MDM 1.3m & 2.4m
Next Generation Survey: LSST


                 Large Synoptic Survey
                   Telescope (LSST):
                   1 Gb every 2 seconds

                     106 supernovae/yr
                     105 eclipsing systems
                     107 asteroids...

                      light curves of 800
                     million sources every
                             3 days
Transients Classification Pipeline
                                  “Object”
                                 Datastream




                                   source


                           T
                                 generation




                           C
                                   feature
                                 generation



                           P       source
                                classification
                                                   Database



    Follow-up
telescope observations

                                Broadcast
Parallelized source correlation
                             and classification

                •   Retrieve difference objects

                •   Each difference-object is passed to an IPython client

                •   Each parallel IPython client performs:
                     •   Source creation or correlation with existing sources

                     •   “Feature” generation (or re-generation) for that source

   source            •   Classification of that source
 generation




   feature
 generation




   source
classification
Parallelized source correlation
                             and classification

                •   Realtime TCP runs on 22 dedicated cores

                •   LCOGT’s 96 core beowulf
                     •   non run-time tasks

                     •   Classifier generation


                •   Additional resources: (for future classification work)
                     •   Yahoo! M45 cluster
   source
 generation          •   Amazon EC2 cluster


   feature
 generation




   source
classification
Warehouse of light-curves

•   Need representative light-curves for all science

•   With these we can model each science class

•   We’ve built a warehouse of example light-curves




     TCP-TUTOR                 DotAstro.org
        internal interface        public interface
“Noisifying to the Survey”

•   Well sampled light-curves
     •   Can make good classifiers for well-sampled data.

     •   Don’t immediately make good classifiers for noisy, sparse data.


•   We need classifiers which are trained using:
     •   sampling cadence of our survey

     •   sparseness of our survey data

     •   noise and sensitivity limitations of our instrument


•   We need “Noisification” software which:
     •   Resamples well-sampled light-curves

     •   Outputs noisified sources which are used for generating classifiers
“Noisifying to the Survey”
“Noisifying to the Survey”

•   For PTF:
     •   Code uses PTF pointing and survey observing plans

     •   Occasionally PTF observes using a faster cadence:

           •    7.5 minutes between revisiting an RA, Dec

           •    Faster cadence requires a separate set of noisified light-curves
                and classifiers.


•   Other surveys:
     •   Other pointing and observing plans could be used.

     •   Can generate noisified light-curves for other surveys.

     •   Then we can generate science classifiers for these surveys.
Classifiers
       •    General Classifier
                  Identify:                               Filter out:

•   well sampled (periodic & nonperiodic)       •   poorly subtracted sources

•   interesting sources near known galaxies     •   minor planets / rocks

•   periodic variable science class when        •   cosmic rays
    confidence is high
                                                •   detector defects


       •    Timeseries Classifiers
              •    Weighted combination of WEKA classifiers

                     •    bagged Random Forest classifier using a cost-matrix

                     •    Each classifier trained on different cadenced noisified data

              •    Astronomer crafted classifiers for specific science types

                     •    Microlens, Super Nova
Interesting near-galaxy PTF sources

 • Identified by TCP during end of Aug ‘09
 • Classification triggered by latest epoch
    added to the source
Periodic variable classifiers
                   •     Currently, science classes are determined by combining
                         the weighted probabilities generated by different
                         classification models, for a source.
                                                                                                         ~0.4 day period
~0.14 day period
 RR Lyrae using    •     Each machine-learned classification model is trained using                       RR Lyrae using
                                                                                                            10 epoch
   20 epoch              “noisified” lightcurves which were generated using
                         different parameters.                                                            noisification
  noisification
                                                               ...shows highest classification
                               Clicking on a class for one
                                                                probability sources for that
                               of dozens of ML models...
                                                                        model::class




                     Overplotting of
                                                                                  period-fold plotting
                   period-folded model
                                                                                  probably failed here
                     still needs work



                                            0.1 - 0.17 day period RR Lyrae
                                             using 15 epoch noisification
Evaluating and Combining Classifiers


•   Issues when using multiple classifiers:
      •    How to combine classifiers when using:

            •    weighted classifiers

            •    tree-hierarchy of sub-classifiers

      •    How to generate final classification “probabilities” when using:

         • Widely varying types of classifiers
         • Classifiers which contain sub-classifications & probabilities
•   Evaluate the final combination of classifiers
      •    Classify PTF09xxx user classified sources, determine efficiencies

      •    Classify noisified sources, determine efficiencies
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

More Related Content

Viewers also liked

Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopDan Starr
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essayapogarl
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionAnalog Devices, Inc.
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue PowerpointCasandraAdams
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemJoshua Sin
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Mohammad Hijazi
 

Viewers also liked (19)

Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
S E V E N W O N D E R S
S E V E N W O N D E R SS E V E N W O N D E R S
S E V E N W O N D E R S
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essay
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solution
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue Powerpoint
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystem
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Culture Of Great India
Culture Of  Great IndiaCulture Of  Great India
Culture Of Great India
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Exacqvision2
Exacqvision2Exacqvision2
Exacqvision2
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)
 
Proxy & CGLIB
Proxy & CGLIBProxy & CGLIB
Proxy & CGLIB
 
News Corp
News CorpNews Corp
News Corp
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013smarru
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkDatabricks
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAbhishek Asthana
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchKepa J. Rodriguez
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionJunpei Kawamoto
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911. (20)

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Far cry 3
Far cry 3Far cry 3
Far cry 3
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical Research
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based Encryption
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

  • 1. Josh Bloom (PI) , Justin Higgins, Adam Morgan
  • 3. SASIR LSST SDSS PTF / LBL (future) (future) stripe-82 subtraction archived data pipeline Survey X Survey Y (real-time survey telescope) (static survey repository) Transients Classification Pipeline Database containing Classify Broadcast “sources” “sources” • features for a source • interesting or transient source • include classifications Database • data epochs associated • include features, context with a source Broadcast
  • 4. SDSS Stripe 82 SDSS stripe-82 archived data • A deep field from the Sloan Digital Sky Survey • 750 Million observation epochs Transients • ~20 Million “sources” clustered from epochs • 5 colors / filters, 4 years of observations Classification • We used Stripe-82 for testing and development Pipeline Database containing “sources” • features for a source • data epochs associated with a source
  • 5. Palomar Transient Factory • Palomar 48” telescope • 100 Mpix, 7.8 sq-deg detector • ~120s cadence : ~200MB : <100GB/night • Post subtraction: ~1M difference objects / night • Post filtering: ~10k difference objects / night ~100s transient and variable stars LBL subtraction pipeline T PTF consortium PAIRITEL 1.3m C P Palomar 60” MDM 1.3m & 2.4m
  • 6. Next Generation Survey: LSST Large Synoptic Survey Telescope (LSST): 1 Gb every 2 seconds 106 supernovae/yr 105 eclipsing systems 107 asteroids... light curves of 800 million sources every 3 days
  • 7. Transients Classification Pipeline “Object” Datastream source T generation C feature generation P source classification Database Follow-up telescope observations Broadcast
  • 8. Parallelized source correlation and classification • Retrieve difference objects • Each difference-object is passed to an IPython client • Each parallel IPython client performs: • Source creation or correlation with existing sources • “Feature” generation (or re-generation) for that source source • Classification of that source generation feature generation source classification
  • 9. Parallelized source correlation and classification • Realtime TCP runs on 22 dedicated cores • LCOGT’s 96 core beowulf • non run-time tasks • Classifier generation • Additional resources: (for future classification work) • Yahoo! M45 cluster source generation • Amazon EC2 cluster feature generation source classification
  • 10. Warehouse of light-curves • Need representative light-curves for all science • With these we can model each science class • We’ve built a warehouse of example light-curves TCP-TUTOR DotAstro.org internal interface public interface
  • 11.
  • 12.
  • 13. “Noisifying to the Survey” • Well sampled light-curves • Can make good classifiers for well-sampled data. • Don’t immediately make good classifiers for noisy, sparse data. • We need classifiers which are trained using: • sampling cadence of our survey • sparseness of our survey data • noise and sensitivity limitations of our instrument • We need “Noisification” software which: • Resamples well-sampled light-curves • Outputs noisified sources which are used for generating classifiers
  • 14. “Noisifying to the Survey”
  • 15. “Noisifying to the Survey” • For PTF: • Code uses PTF pointing and survey observing plans • Occasionally PTF observes using a faster cadence: • 7.5 minutes between revisiting an RA, Dec • Faster cadence requires a separate set of noisified light-curves and classifiers. • Other surveys: • Other pointing and observing plans could be used. • Can generate noisified light-curves for other surveys. • Then we can generate science classifiers for these surveys.
  • 16. Classifiers • General Classifier Identify: Filter out: • well sampled (periodic & nonperiodic) • poorly subtracted sources • interesting sources near known galaxies • minor planets / rocks • periodic variable science class when • cosmic rays confidence is high • detector defects • Timeseries Classifiers • Weighted combination of WEKA classifiers • bagged Random Forest classifier using a cost-matrix • Each classifier trained on different cadenced noisified data • Astronomer crafted classifiers for specific science types • Microlens, Super Nova
  • 17. Interesting near-galaxy PTF sources • Identified by TCP during end of Aug ‘09 • Classification triggered by latest epoch added to the source
  • 18. Periodic variable classifiers • Currently, science classes are determined by combining the weighted probabilities generated by different classification models, for a source. ~0.4 day period ~0.14 day period RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using 10 epoch 20 epoch “noisified” lightcurves which were generated using different parameters. noisification noisification ...shows highest classification Clicking on a class for one probability sources for that of dozens of ML models... model::class Overplotting of period-fold plotting period-folded model probably failed here still needs work 0.1 - 0.17 day period RR Lyrae using 15 epoch noisification
  • 19. Evaluating and Combining Classifiers • Issues when using multiple classifiers: • How to combine classifiers when using: • weighted classifiers • tree-hierarchy of sub-classifiers • How to generate final classification “probabilities” when using: • Widely varying types of classifiers • Classifiers which contain sub-classifications & probabilities • Evaluate the final combination of classifiers • Classify PTF09xxx user classified sources, determine efficiencies • Classify noisified sources, determine efficiencies