SlideShare uma empresa Scribd logo
1 de 22
Real Time-Big Data-Social
 Network-Data Science-Gamified!




                                                  Jason Capehart
a.k.a. The Cascade Project                            12/12/12

(Okay … that last part of the title isn’t true)
1. Visualization

2. Data

3. Analysis
Show Me!
The Good, The Bad, The Ugly
Surely, You Must Be Joking.



            Store            Examples
Key-Value           Hadoop, Memcached, Redis
Document            MongoDB, CouchDB
Graph               Neo4j, Giraph, Titan
Real Time           Storm, Impala
Citation:
Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News
Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600).
Raleigh, NC: ACM.
Citation:
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM
Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
800,000,000
   (that’s a lot of users)


   (cost = 200k for fire hose)
Sampled

                  Not Sampled




Citation:
Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free:
Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
# Pseudo Code

id_guess = randint(0, 10^9)

user = api.get_user(id = id_guess)

Repeat until tired or rate limited
Power Law (xmin = 281, α = 2.19)
 Lognormal



Discrete Power Law vs.
Lognormal
Loglikelihood
                89.46
Ratio
Vuong’s Test
                7.14
Statistic
p-val
                >0.99
(1-sided)
Power Law (xmin = 222, α = 2.33)
Lognormal
Stretched Exponential
• Conclusions = None!
  – All work is in progress



• Discussion
  – Cascade uses open source
  – Opportunities to give back?
References
1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703
   (2009). (arXiv:0706.1062, doi:10.1137/070710111)
      –      Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/
2.   Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics,
     46(5), 323-351.
3.   Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings
     of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM
4.   Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of
     networks. Proceedings of the National Academy of Sciences, 4221-4224.

Mais conteúdo relacionado

Mais procurados

Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska PacificResearchPlatform
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
2020 ml swarm ascend presentation
2020 ml swarm ascend presentation2020 ml swarm ascend presentation
2020 ml swarm ascend presentationKyongsik Yun
 
Portable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersPortable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersThomas Rausch
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: EywaEugene Siow
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren KeynoteData Con LA
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Python for data science
Python for data sciencePython for data science
Python for data scienceWei-Wen Hsu
 

Mais procurados (20)

Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
2020 ml swarm ascend presentation
2020 ml swarm ascend presentation2020 ml swarm ascend presentation
2020 ml swarm ascend presentation
 
Portable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersPortable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge Computers
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: Eywa
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren Keynote
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
 
Amman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKayAmman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKay
 
Python for data science
Python for data sciencePython for data science
Python for data science
 

Semelhante a Cascade Project

When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesData2B
 
Genomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedGenomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedLarry Smarr
 
The Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsThe Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsLarry Smarr
 
The Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityThe Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityLarry Smarr
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Niki Pavlopoulou
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineLarry Smarr
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningLarry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Education in a Globally Connected World
Education in a Globally Connected WorldEducation in a Globally Connected World
Education in a Globally Connected WorldLarry Smarr
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFlávio Codeço Coelho
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesLarry Smarr
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 
Building a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryBuilding a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryLarry Smarr
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...Larry Smarr
 

Semelhante a Cascade Project (20)

Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphes
 
Genomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedGenomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light Speed
 
The Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsThe Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research Efforts
 
The Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityThe Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human Reality
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Education in a Globally Connected World
Education in a Globally Connected WorldEducation in a Globally Connected World
Education in a Globally Connected World
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 
Building a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryBuilding a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive Discovery
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
 

Último

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Cascade Project

  • 1. Real Time-Big Data-Social Network-Data Science-Gamified! Jason Capehart a.k.a. The Cascade Project 12/12/12 (Okay … that last part of the title isn’t true)
  • 3.
  • 4.
  • 6.
  • 7.
  • 8. The Good, The Bad, The Ugly
  • 9. Surely, You Must Be Joking. Store Examples Key-Value Hadoop, Memcached, Redis Document MongoDB, CouchDB Graph Neo4j, Giraph, Titan Real Time Storm, Impala
  • 10.
  • 11. Citation: Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM.
  • 12. Citation: A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
  • 13. 800,000,000 (that’s a lot of users) (cost = 200k for fire hose)
  • 14. Sampled Not Sampled Citation: Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
  • 15.
  • 16. # Pseudo Code id_guess = randint(0, 10^9) user = api.get_user(id = id_guess) Repeat until tired or rate limited
  • 17.
  • 18. Power Law (xmin = 281, α = 2.19) Lognormal Discrete Power Law vs. Lognormal Loglikelihood 89.46 Ratio Vuong’s Test 7.14 Statistic p-val >0.99 (1-sided)
  • 19.
  • 20. Power Law (xmin = 222, α = 2.33) Lognormal Stretched Exponential
  • 21. • Conclusions = None! – All work is in progress • Discussion – Cascade uses open source – Opportunities to give back?
  • 22. References 1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111) – Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/ 2. Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46(5), 323-351. 3. Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM 4. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.

Notas do Editor

  1. Nature of the Beast, comparison to weblogs