Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Geo-Spotting: Mining Online Location-based
Services for Optimal Retail Store Placement
Dmytro Karamshuk
King's College London

Based on the paper:
D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, C. Mascolo. Geo-Spotting: Mining Online

Location-based Services for Optimal Retail Store Placement. ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD), Chicago, 2013

Optimal Retail Location Problem

Among L possible locations in the city select one where
a new store would be most profitable/popular.

The problem is not new
●

A. Athiyaman. Location decision making: the case of retail service development in a
closed population. In Academy of Marketing Studies, volume 15, page 13, 2010.

●

O. Berman and D. Krass. The generalized maximal covering location problem.
Computers & Operations Research, 29(6):563–581, 2002.

●

A. Kubis and M. Hartmann. Analysis of location of large-area shopping centres. a
probabilistic gravity model for the halle-leipzig area. Jahrbuch für
Regionalwissenschaft, 27(1):43–57, 2007.

●

Pablo Jensen. Network-based predictions of retail store commercial categories and
optimal locations. Phys. Rev. E, 74:035101, Sep 2006.

Our approach:
explore fine-grained and cheap data from LBSN

Location-based social networks

●

check-in at places

●

share with your friends

●

receive bonuses for check-ins

●

search for places

●

leave comments for others

Check-ins around the world

over 40M users

over 4.5B check-ins

Collecting the Data

Dataset collected in New York
●

37K venues

●

47K users

●

621K checkins

●

May – November, 2010
accounts for »25% of the original data

How popular is a venue?

The distance between the two places is only few hundred meters

How popular is a venue?
Distribution of check-ins per place

Geographic distribution of venues

size = #checkin
●

popularity can be several orders of magnitude different from place to place

●

probably it depends on the location and types of places

Popularity and type of venue

●

different types and chains of
venues have different usage
patterns

●

we cannot compare check-ins
across venues of different
chains but we can across
individual chains

Number of check-ins per place for
individual chains of restaurants

Co-location with other venues
How frequently we observe a Starbucks close to a
railway station?

Does it influence the popularity of a restaurant?
Pablo Jensen. Analyzing the localization of retail stores
with complex systems tools. IDA ’09, pages 10–20, Berlin,
Heidelberg, 2009. Springer-Verlag.

User mobility between places
How many users go to a Starbucks
after railway station?

●

there is correspondence between colocation and mobility patterns

●

but also many discrepancies


Among L possible locations in the city select one where new store would be most popular.

Define the area
An area is defined as a disc of radius r around a point with geographical coordinates l

The area is described by a set of numeric features
check-ins at venues in the disk.

designed from

Geographic features of an area
●

density – number of venues in the area

●

neighbors entropy – heterogeneity of venue types

●

competitiveness – percentage of competing venues

Geographic features of an area
●

quality by Jensen
–

define inter-types attractiveness coefficients

–

weight surrounding venues by their attractiveness

Mobility features of an area
●

area popularity – total number of checkins in the area

●

transition density – intensity of transitions inside the area

●

incoming flows – intensity of transitions from outside areas

Mobility features of an area
●

transition quality
●

define transition coefficients for each type

●

weight venues according to the product of coefficient and
check-ins volume

Ranking problem
Use area features

to rank all areas in a given set L

according to their potential popularity.

Compare with the ground truth: ranking of places basing
on their actual popularity.

Evaluation metrics
Compare the predicted and ground truth rankings.
●

Top-K locations ranking – use NDCG@K

●

Accuracy of the best prediction – Accuracy@X% of having the
best predicted store in the Top-X% of ground truth ranking
We explore random cross-validation approach and report
average values across all experiments.

Performance of individual features
NDCG@10

●

some indicators are general across various chains while some are chain-specific

●

the lack of competitors in the area play positive role as do the existence of place attractors

●

performance of In.Flow is in accordance with the fact that McDonalds attract more users
from the remote areas

Considering fusion of factors
Explore the fusion of features in a supervised learning approach

●

regression for ranking – conduct regression using Linear Regression,
SVR or M5P and then rank according to regressed values

●

pair-wise ranking – learn on pair-wise comparison using neural
networks RankNet

Use the same evaluation methodology as for individual features.

Results of the supervised learning
NDCG@10

Individual features

Supervised learning

●

supervised learning has better performance than the the best individual feature

●

the combination of geographic features and mobility features yields better result than
the combination of geographic features alone

●

regression to rank with SVR is the best performing technique

The best location prediction

Supervised learning
Individual features
●

supervised learning yields reliable and significantly improved result

●

the best prediction lies in top-20% of the ground truth ranking with
probability over 80%

Implications
●

we show how fine-grained data from location-based social networks
can be effectively explored in geographic retail analysis

●

this can inspire further works in location-based advertising, developing
indexes of urban areas, provision of location-based services etc. etc.

●

particularly we see a lot of potential in the approach of measuring user
flows from check-ins in various applications

●

we also faced some challenges when scaling this approach to other
chains and cities

Thank you for your attention!
Dmytro Karamshuk
King's College London
follow me on Twitter: @karamshuk

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement

Similar to Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement (20)

Recently uploaded

Recently uploaded (20)

Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement