Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

•Transferir como PPTX, PDF•

4 gostaram•1,801 visualizações

Dr. Gilmore is currently a data scientist on the team at Ayasdi where she specializes in highly complex and dimensional data across a variety of industries. Prior to joining Ayasdi, Allison served as a National Science Foundation Post-Doctoral Fellow and an Assistant Adjunct Professor in mathematics at the University of California Los Angeles. Dr. Gilmore also did post-doctoral research at Princeton University. She received her Ph.D. in mathematics from Columbia University in New York in May 2011. Allison completed her undergraduate and masters degrees from Washington University where she was selected as a Rhodes Scholar. She studied at Green College, Oxford University, and graduated in 2006 with an M.Phil. (with distinction) in sociology. Her research interests include topology, geometry, network analysis and social movements. Dr. Gilmore serves on the board of The Friends of the Mandela Rhodes Foundation whose mission is to fund the development of exceptional leadership capacity in southern Africa.

Tecnologia

The Shape of Data
Allison Gilmore
Principal Data Scientist
November 13, 2015

Company Confidential & Proprietary 2
Data has shape.
Shape has meaning.
You already know this.

Company Confidential & Proprietary
Shape as Organizing Principle

Company Confidential & Proprietary
Geometry or Topology?
Geometry : Metric Topology : Locality
≅

Company Confidential & Proprietary
Topological Summaries Capture Shape
Lens

Company Confidential & Proprietary
Topological Summaries Capture Shape

Company Confidential & Proprietary
Enhancing Traditional Methods

Company Confidential & Proprietary 9
Topological Summaries Capture Shape
Nodes are groups of
similar data points.
Edges connect similar
nodes.
Node position on the
screen does not matter.

Company Confidential & Proprietary
Enhancing Traditional Methods
PCA sees 3 clusters.
Using PCA coordinates
as lenses, we can see
more.

Company Confidential & Proprietary
Topological Summary Shows 4 Clusters

Company Confidential & Proprietary
Disease State & Model Choice
David Schneider, Stanford Microbiology and Immunology

Company Confidential & Proprietary 14
Topological Model for Total Knee Replacement
Low length of stay
Low to moderate length of stay
Long length of stay

Company Confidential & Proprietary
Carepaths for Total Knee Replacement
16

Company Confidential & Proprietary
Beating* the Curse of Dimensionality
18
* I mean, there are always conditions.
Niyogi, Smale, and Weinberger, A Topological View of Unsupervised Learning from Noisy Data,
SIAM J. of Computing 20(2011) 646-663. http://math.uchicago.edu/~shmuel/noise.pdf
If a dataset is supported near a manifold, its key
topological features can be detected from a
sample whose size is independent of the
dimension of ambient space.
Doesn’t matter!
Dimension d < N

Company Confidential & Proprietary 19
Questions?
Allison.Gilmore@Ayasdi.com
www.ayasdi.com

Company Confidential & Proprietary 20
Understanding Shape Improves Models
20
HighLow
Ground Truth Fraud Model Predicted Fraud
HighLow

Company Confidential & Proprietary
Topology Guides Model Creation
21

Mais conteúdo relacionado

Mais de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf

Josh Wills - Data Labeling as Religious ExperienceMLconf

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf

Meghana Ravikumar - Optimized Image Classification on the CheapMLconf

Noam Finkelstein - The Importance of Modeling Data CollectionMLconf

June Andrews - The Uncanny Valley of MLMLconf

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf

Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf

Neel Sundaresan - Teaching a machine to codeMLconf

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf

Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf

Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf

Mais de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding

Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...

Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush

Josh Wills - Data Labeling as Religious Experience

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...

Meghana Ravikumar - Optimized Image Classification on the Cheap

Noam Finkelstein - The Importance of Modeling Data Collection

June Andrews - The Uncanny Valley of ML

Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks

Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...

Vito Ostuni - The Voice: New Challenges in a Zero UI World

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...

Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...

Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...

Neel Sundaresan - Teaching a machine to code

Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...

Soumith Chintala - Increasing the Impact of AI Through Better Software

Roy Lowrance - Predicting Bond Prices: Regime Changes

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

GenCyber Cyber Security Day PresentationMichael W. Hawkins

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

A Call to Action for Generative AI in 2024Results

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Artificial Intelligence: Facts and MythsJoaquim Jorge

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Real Time Object Detection Using Open CVKhem

Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

1. The Shape of Data Allison Gilmore Principal Data Scientist November 13, 2015

2. Company Confidential & Proprietary 2 Data has shape. Shape has meaning. You already know this.

3. Company Confidential & Proprietary Shape as Organizing Principle

4. Company Confidential & Proprietary Geometry or Topology? Geometry : Metric Topology : Locality ≅

5. Company Confidential & Proprietary Topological Summaries Capture Shape Lens

6. Company Confidential & Proprietary Topological Summaries Capture Shape

7. Company Confidential & Proprietary Topological Summaries Capture Shape

8. Company Confidential & Proprietary Enhancing Traditional Methods

9. Company Confidential & Proprietary 9 Topological Summaries Capture Shape Nodes are groups of similar data points. Edges connect similar nodes. Node position on the screen does not matter.

10. Company Confidential & Proprietary Enhancing Traditional Methods PCA sees 3 clusters. Using PCA coordinates as lenses, we can see more.

11. Company Confidential & Proprietary Topological Summary Shows 4 Clusters

12. Company Confidential & Proprietary Disease State & Model Choice David Schneider, Stanford Microbiology and Immunology

13. Company Confidential & Proprietary 14 Topological Model for Total Knee Replacement Low length of stay Low to moderate length of stay Long length of stay

14. Company Confidential & Proprietary Carepaths for Total Knee Replacement 16

15. Company Confidential & Proprietary Beating* the Curse of Dimensionality 18 * I mean, there are always conditions. Niyogi, Smale, and Weinberger, A Topological View of Unsupervised Learning from Noisy Data, SIAM J. of Computing 20(2011) 646-663. http://math.uchicago.edu/~shmuel/noise.pdf If a dataset is supported near a manifold, its key topological features can be detected from a sample whose size is independent of the dimension of ambient space. Doesn’t matter! Dimension d < N

16. Company Confidential & Proprietary 19 Questions? Allison.Gilmore@Ayasdi.com www.ayasdi.com

17. Company Confidential & Proprietary 20 Understanding Shape Improves Models 20 HighLow Ground Truth Fraud Model Predicted Fraud HighLow

18. Company Confidential & Proprietary Topology Guides Model Creation 21

Notas do Editor

Real data has all these shapes and more.
Next time show a second toy example as well, to demonstrate how different shapes would give different graphs. One audience member later asked about a horizontal ellipse and vertical ellipse intersecting. This also gives an opening to talk about invariance.
The output is a graph. Just the nodes and edges, not the embedding. It’s a simple, combinatorial summary of something much more complex. It stays a simple combinatorial object even if your original data is in a very high dimensional space. Also, this is topology (mostly) – would have gotten the same graph if the points had been sampled from an ellipse or a square.
So, objects close together in the network (in the same or nearby nodes) are similar (on whatever features you used to build the network).
PCA captures 98.4% of variance. TDA with PCA lenses shows 4 clusters.
With the best practices that are surfaced, you can start to create a care path that will be the baseline or template going forward. You can add and subtract events into this care path.

Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de MLconf

Mais de MLconf (20)

Último

Último (20)

Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Notas do Editor