This document describes Amr Koura's work in implementing and comparing batch and incremental modes of the Local Outlier Factor (LOF) algorithm. The goals were to code LOF in batch and incremental modes, integrate the code into an open source project, and compare the two modes. Incremental LOF was found to have equivalent outlier detection performance to static LOF while requiring less computation time and having lower computational complexity of O(N log N).
1. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 1
Supervisor: Sebastian Bothe
Local Outlier FactorLocal Outlier Factor
2. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 2
Supervisor: Sebastian Bothe
LabGoalLabGoal
Implement Local Outlier factory Batch Mode.
Implement Local Outlier factory Incremental Mode.
Comparetwo modes.
Integratecodeinto open sourceproject “RealKD”:
https://bitbucket.org/realKD/
3. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 3
Supervisor: Sebastian Bothe
MotivationMotivation
http://www.dbs.ifi.lmu.de/Publikationen/Papers/LOF.pdf
4. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 4
Supervisor: Sebastian Bothe
Local Outlier FactorLocal Outlier Factor
reach−distk ( A , B)=max(d (B , A),k−distance(B))
lrd (A)=
1
∑
B∈KNN (A)
reach−distk ( A , B)/k
LOF (A)=
1
k
∑
B∈KNN (A)
lrd (B)
lrd ( A)
https://en.wikipedia.org/wiki/Local_outlier_factor
5. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 5
Supervisor: Sebastian Bothe
DemoDemo
12. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 12
Supervisor: Sebastian Bothe
Incremental LOF AdditionIncremental LOF Addition
Cities9,10 haschangein their K-distance.
According to:
TheLRD for citiesexistsin K-NN of cities(9,10) should updated
LRD List={9,10,2}
According to , all citesthat hasany of cities
{9,10,2} in their new nearest neighbour should updatethier LOF
value. LOF List={9,10,2,0,7}
lrd (A)=
1
∑
B∈KNN (A)
reach−distk ( A , B)/k
LOF ( A)=
1
k
∑
B∈KNN ( A)
lrd (B)
lrd (A)
13. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 13
Supervisor: Sebastian Bothe
Comparison between staticandincremental LOFComparison between staticandincremental LOF
Running static LOF output:
1.1909475617292364 1.1956830856346556 0.9645631106850818
0.8029601477829005 0.7577540135599361 0.7377495644370516
0.7509608512974867 0.99956101138198 0.6943310060958396
2.3423102537190847 2.342310253719085 2.342310253719085
Running incremental LOF and addition output:
1.1909475617292364 1.1956830856346556 0.9645631106850818
0.8029601477829005 0.7577540135599361 0.7377495644370516
0.7509608512974867 0.99956101138198 0.6943310060958396
2.3423102537190847 2.342310253719085 2.342310253719085
14. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 14
Supervisor: Sebastian Bothe
ConclusionConclusion
Implementation of Batch incremental modehasdone.
Batch modecodeisintegrated into theproject repository while
pull request hasmadeto integrateit.
Incremental LOF hasequivalent detection performanceas static
LOF.
Incremental LOF requireslesscomputation timethan time.
Incremental LOF complexity isO(N log N)
15. Data Mining Lab,
Local Outlier Factor
Amr Koura / Page 15
Supervisor: Sebastian Bothe
Thank you