O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Learn about Your Location (Using ALL Your Data)

401 visualizações

Publicada em

Taryn Price and Courtney Shindeldecker's keynote presentation in the Machine Learning presentatation at the Charlottesville's 2017 Tom Tom Fest.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Learn about Your Location (Using ALL Your Data)

  1. 1. Taryn Price and Courtney Shindeldecker April 13, 2017 Learn About Your Location (Using All Your Data) Machine Learning in Data Fusion and Analysis
  2. 2. • Data fusion example - Chicago data • Wikipedia, imagery, crime events, Open Street Maps • Embedding Space • What is it? What can we do with it? • Three types of embedding space useage • Similarity search • Classification • Clustering Outline
  3. 3. • A large variety of data sources are available. • Open Data Portal • Wikipedia • Open Street Maps • Satellite imagery • A large variety of data types are available. • Shapefiles (subway lines, parks, building footprints) • “Flat files” (crime events, lobbyist registration, building permits) • Text (descriptions of locations and/or events) • Images (satellite) Introducing: Chicago (Why Chicago?)
  4. 4. Triples to embedding space
  5. 5. • 1652 Chicago-area articles • Convert articles to document embeddings • Find nearest neighbors (6) <wiki01> <similarText> <wiki04> Chicago Wikipedia
  6. 6. The Eisenhower Public Library District is a public library located in Harwood Heights, Illinois, one of two suburbs completely surrounded by but not incorporated into Chicago. Chicago Wikipedia - similarity example The Pritzker Military Museum & Library (formerly Pritzker Military Library) is a museum and a research library for the study of military history in Chicago, Illinois, US.
  7. 7. Chicago imagery • Subset crime data • 5000 “events” • Download imagery • Zoom level 18 • Create chips out of image(s) • Select chips that contain a crime event from the subset. (4,760 chips)
  8. 8. • Extract image-chip features • VGG-19 CNN classifier • Use last layer before the softmax as chip features • Apply L2 norm • Find nearest neighbors (10) <chip01> <similarImgTo> <chip07> Chicago imagery
  9. 9. Chicago imagery - similarity example
  10. 10. • Download csv of events • Jun 1, 2016 - Mar 27, 2017 • ~300,000 crime events • Select potentially useful field: “Primary Type” • Ex. Arson, Narcotics, Theft, Robbery Now what…? Chicago crime events
  11. 11. Open Street Map (OSM)
  12. 12. • ~900,000 polygons • Attributes • Amenity (restaurant, cafe, parking, fast food, etc.) • Building (residential, house, school, church, etc.) Again, now what…? Open Street Map
  13. 13. • Create covering polygons to “roll up” data • Divide bounding polygon using OSM “points” • Dense areas contain smaller voronoi polygons • Mean area ~2 kmsq Chicago voronois
  14. 14. • Point / polygon data • <Vor01> <hasCrimeType> <robbery> • <Vor01> <hasOSMAmenity> <grave yard> • <Vor01> <hasOSMBuilding> <school> • Wikipedia • <Vor01> <hasWikiSite> <Wiki100> • Imagery • <Vor01> <hasImage> <Img05> Chicago voronois - tie data together
  15. 15. Sample of resulting graph Vor 1 Vor 2 Assault Arson church Wiki 100 Wiki 54 office Img 1 Img 5
  16. 16. … Create embeddings from triples … (more in a minute)
  17. 17. t-SNE of resulting embeddings
  18. 18. Example use: dentist office locations
  19. 19. Exploring Fused Embedding Space with Machine Learning
  20. 20. • We create embedding vectors (or more simply, embeddings) to represent entities, relationships, groups and concepts in the graph • Embeddings are low-dimensional, dense, real-valued vectors • They can be generated in an unsupervised fashion (no labeling or annotation required) • Powerful, general-purpose representations of entities and relationships Representing graphs with vectors 3-dimensional embedding vector [2, 4, 5] 10-dimensional embedding vector: [3, 1, -3, 4, 8, 2, -7, 21, 7,
  21. 21. Translational models for embedding graphs Learn how to place entities in vector space Inferencing using vector arithmetic Common Technique: Trans-E
  22. 22. As the model trains, embedding locations are learned and updated over time. Eventually, the model converges and we can explore the resulting embedding space. Learning Embedding Locations
  23. 23. Low-dimensional embedding space can be visualized and investigated. (This task becomes difficult above three dimensions) Embeddings Space
  24. 24. Machine learning leverages embeddings Supervised and unsupervised techniques over embeddings allow us to ask probabilistic questions: How much alike are these entities? What is the probability that relationship r holds between two entities? Classification ClusteringSimilarity search
  25. 25. Given an entity of interest, what other entities in my data are most similar? Multiple approaches, but with learned embedding space, we can answer this [almost] for free Example: using vessel metadata + AIS track data, find vessels that are similar to one another Similarity Search Norwegian Jewel
  26. 26. Given an entity of interest, what other entities in my data are most similar? Multiple approaches, but with learned embedding space, we can answer this [almost] for free Example: using vessel metadata + AIS track data, find vessels that are similar to one another Similarity Search Norwegian Jewel Most Similar Vessel: Carnival Elation
  27. 27. 01 02 03 Shows all vessels Blue Heat Map 30 most similar vessels to the Norwegian Jewel Orange Heat Map Activity pattern of this vessel group quite distinct from the dominant blue pattern Activity Pattern Vessel similarity
  28. 28. Given a training set of entities and their labels, classify all other [unlabeled] entities in the data Common techniques: • Random Forest • k-Nearest Neighbors • SVM Example: Hurricane Katrina relief in New Orleans: Identify potential food distribution centers across the entire city Classification Random Forest Classifier
  29. 29. Voronoi discretization of the physical space Open Street Maps Overhead imagery
  30. 30. Voronoi discretization of the physical space Open Street Maps Overhead imagery Voronoi covering of the region informed by both data sources
  31. 31. Create training data Green is labeled as a good site for distribution center Use these labeled examples to train classifier Superdome New Orleans Saints Training Facility
  32. 32. Use the trained classifier on all voronois Red voronoi embeddings were classified by the model as potential distributio center sites
  33. 33. Discover matching locations not investigated manually Areas with large buildings and proximity to highways were identified
  34. 34. Applications in other domains Retail: opening a new locationAgriculture: finding land optimal to your crop Ecology: identifying regions with or in danger of erosion
  35. 35. Unsupervised Learning: Clustering Goal: Is there structure in my data? Technique: k-means ● Partition all embeddings into k groups ● Each embedding belongs to the cluster of nearest mean (prototype embedding of the cluster) Clustering results are not based on any specific labeling of each cell
  36. 36. Technology: Building a Graph of Data Unsupervised characterization of locations By clustering locations based on their embeddings, we can generate informative partitions of an area
  37. 37. Technology: Building a Graph of Data Unsupervised characterization of locations By clustering locations based on their embeddings, we can generate informative partitions of an area Sea lanes Offshore platforms
  38. 38. Taryn Price: tprice@ccri.com Courtney Shindeldecker: cshindeldecker@ccri.com We’re hiring! Check out www.ccri.com for more info, blog, and sweet videos www.ccri.com Twitter: @ccr_inc Look us up at CCRi

×