Dr. Gilmore is currently a data scientist on the team at Ayasdi where she specializes in highly complex and dimensional data across a variety of industries. Prior to joining Ayasdi, Allison served as a National Science Foundation Post-Doctoral Fellow and an Assistant Adjunct Professor in mathematics at the University of California Los Angeles. Dr. Gilmore also did post-doctoral research at Princeton University. She received her Ph.D. in mathematics from Columbia University in New York in May 2011.
Allison completed her undergraduate and masters degrees from Washington University where she was selected as a Rhodes Scholar. She studied at Green College, Oxford University, and graduated in 2006 with an M.Phil. (with distinction) in sociology.
Her research interests include topology, geometry, network analysis and social movements. Dr. Gilmore serves on the board of The Friends of the Mandela Rhodes Foundation whose mission is to fund the development of exceptional leadership capacity in southern Africa.
9. Company Confidential & Proprietary 9
Topological Summaries Capture Shape
Nodes are groups of
similar data points.
Edges connect similar
nodes.
Node position on the
screen does not matter.
10. Company Confidential & Proprietary
Enhancing Traditional Methods
PCA sees 3 clusters.
Using PCA coordinates
as lenses, we can see
more.
12. Company Confidential & Proprietary
Disease State & Model Choice
David Schneider, Stanford Microbiology and Immunology
13. Company Confidential & Proprietary 14
Topological Model for Total Knee Replacement
Low length of stay
Low to moderate length of stay
Long length of stay
15. Company Confidential & Proprietary
Beating* the Curse of Dimensionality
18
* I mean, there are always conditions.
Niyogi, Smale, and Weinberger, A Topological View of Unsupervised Learning from Noisy Data,
SIAM J. of Computing 20(2011) 646-663. http://math.uchicago.edu/~shmuel/noise.pdf
If a dataset is supported near a manifold, its key
topological features can be detected from a
sample whose size is independent of the
dimension of ambient space.
Doesn’t matter!
Dimension d < N
16. Company Confidential & Proprietary 19
Questions?
Allison.Gilmore@Ayasdi.com
www.ayasdi.com
17. Company Confidential & Proprietary 20
Understanding Shape Improves Models
20
HighLow
Ground Truth Fraud Model Predicted Fraud
HighLow
Next time show a second toy example as well, to demonstrate how different shapes would give different graphs. One audience member later asked about a horizontal ellipse and vertical ellipse intersecting. This also gives an opening to talk about invariance.
The output is a graph. Just the nodes and edges, not the embedding. It’s a simple, combinatorial summary of something much more complex. It stays a simple combinatorial object even if your original data is in a very high dimensional space. Also, this is topology (mostly) – would have gotten the same graph if the points had been sampled from an ellipse or a square.
So, objects close together in the network (in the same or nearby nodes) are similar (on whatever features you used to build the network).
PCA captures 98.4% of variance.
TDA with PCA lenses shows 4 clusters.
With the best practices that are surfaced, you can start to create a care path that will be the baseline or template going forward. You can add and subtract events into this care path.