SlideShare uma empresa Scribd logo
1 de 46
Finding the Needle in the IP Stack Dr. Sven Krasser McAfee, Inc. Session ID: RR-403 Session Classification: Intermediate
Agenda Data Mining – A Human Approach English Words Bad Behavior What’s in a File Conclusions 2
Data Mining A Human Approach 3
Anthropometric Data 4 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
Measurements 5 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
Measurements (continueD) 6 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
250 – 200 – 150 – 100 – Weight (in pounds) Height Versus Weight 60	65	70	75	80 Height (in inches) 7
250 – 200 – 150 – 100 – Women Weight (in pounds) Men 60	65	70	75	80 Height (in inches) Height Versus Weight (continued) 8
Putting Weight and Height Into Perspective 9
Best Guess for Gender 100% male 0% female 50% male 50% female Weight (in pounds) Best Guess 0% male 100% female Height (in inches) 10
One Dimension Only 0.15 – 0.10 – 0.05 – 0.00 – 55	60	65	70	75 Height (in inches) 11
Better Features 200 – 180 – 160 – 140 – 120 – 100 – Weight (in pounds) 800	900	1000	1100	1200 Buttock Circumference: “The circumference of the body measured at the level of the maximum posterior protuberance of the buttocks.” 12
Best Guess for Revised Features 13 Weight (in pounds) Best Guess Buttock Circumference
Further Improving the Separation Signal to Noise Features with very different distribution per class Correlation Features with low correlation Dimensionality Consider more features at the same time 14
Email Data in Three Dimensions 15
16 Sparse Data 25  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  10  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  40  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0  3  0  0  0  0  0  0  2  0  0  0  1  0  3  1  0  2  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0 16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
FinalVerdict + Classification Algorithms Decision Trees Decision Forests Support Vector Machines Neural Networks 17
English Words And why do they look English? 18
Some English Words militate caterwaul deracinate arrant concinnity imprecation vertiginous profuse 19
Some English Explanations militate: to have force or influence caterwaul: to make a harsh cry or screech deracinate: to uproot arrant: outright; thoroughgoing concinnity: elegance – used chiefly of literary style imprecation: a curse vertiginous: causing dizziness; also, giddy; dizzy profuse: plentiful; copious 20 Source: http://dictionary.reference.com/
Transition Probabilities 21
Active .com Domains 22 82 million active .com domains
Markov Chains .0073 .0641 .0213 .0912 .0912 .0732 .0014 .2175 .0143 .2626 .0301 .0939 .0322 .2419 .3598 .1457 .0633 .1064 .0588 .1733 .0872 .2738 .0431 .1534 .0932 .0714 .2936 .0437 .1860 .0196 .0371 .0291 .1932 .1120 .1269 .0411 .4759 .2979 ab bn nk ko of fp pu nj ja fe er rr ry yl li ne es eb ba ay un in Analysis of recent domain registrations Using Second Order Markov Chains to detect potentially malicious domain names bnkofpunjab is not legitimate ferrylines.com is legitimate ebay.com is not determinable 23
Limitations of the Markov model Useful to detect malicious domain names Very effective for randomly generated names Detects some legitimate domain names as malicious domains Malicious names similar to legitimate ones (e.g. ebay.com phishing sites) International domain names and punycode Solution: add DNS related features into classification process 24
DNS Features The number of the nameservers that hosted or are hosting this domain  The average time of one nameserver to host this domain  The maximum time of one nameserver to host this domain  The minimum time of one nameserver to host this domain  The number of non-activated nameservers that hosted this domain before Whether the domain is an international one 25
0.15 – 0.10 – 0.05 – 0.00 – Example Feature Density 0	200	400	600 Time of domain on name server (in days) 26
27 Results Analysis True Positive Rate False Positive Rate 27
Bad Behavior Email and Spam 28
IP Blacklist Lookup Mail server looks up sender IP over DNS Simple classifier modeled on IP blacklist query logs Narrow data set – queried IP, source IP, timestamp Deep data set – billions of query records monthly More complex data can be included 29
Q? Q=x Q? Q=x IP Lookups Sender Receiver DNS  Reputation server <Q, S, T> IP=S IP=Q 30
Feature Extraction Breadth features ,[object Object]
Number of recipients
Burstiness (data transmitted in short, uneven spurts)
Sending sessions to individual recipients
Global sending sessions to any recipientSpectral features ,[object Object]
Average and standard deviation of low-frequency discrete Fourier transform (DFT coefficients)
Average and standard deviation of high-frequency DFT coefficientsDistribution features ,[object Object],31
Selection of Advanced Features Geographic features 32 Static features ,[object Object]
Distance
Local time at sender and receiver
Host name features
Dial-up Ips
Reputation of neighboring IPsContent features Sparse distribution features ,[object Object]
Number of “from” domains handled
Persistent sender/receiver address pairs
Message size distribution

Mais conteúdo relacionado

Semelhante a Finding the Needle in the IP Stack

Eset infografia-social-media-day
Eset infografia-social-media-dayEset infografia-social-media-day
Eset infografia-social-media-dayESET Latinoamérica
 
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?ESET Brasil
 
annual-report-2016
annual-report-2016annual-report-2016
annual-report-2016Paul Adler
 
How To Deliver a 5-Star Experience for IoT-Enabled Services
How To Deliver a 5-Star Experience for IoT-Enabled ServicesHow To Deliver a 5-Star Experience for IoT-Enabled Services
How To Deliver a 5-Star Experience for IoT-Enabled ServicesAppDynamics
 
Prova ANP 2012 Gabarito conhecimentos básicos
Prova ANP 2012 Gabarito conhecimentos básicosProva ANP 2012 Gabarito conhecimentos básicos
Prova ANP 2012 Gabarito conhecimentos básicosStephanie Negri
 
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minarCespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minarCanal Dos Concursos
 
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)Canal Dos Concursos
 
Time mangement program questionnaire publication
Time mangement program questionnaire   publicationTime mangement program questionnaire   publication
Time mangement program questionnaire publicationThesigan Nadarajan
 
Design is as good (or flawed) as the people who make it
Design is as good (or flawed) as the people who make itDesign is as good (or flawed) as the people who make it
Design is as good (or flawed) as the people who make itKayla J Heffernan
 
A gab preliminar-aneel10_100_1
A   gab preliminar-aneel10_100_1A   gab preliminar-aneel10_100_1
A gab preliminar-aneel10_100_1Igornoliveira
 
왕비를 구하기 위한 용사의 여정 Q-Learning
왕비를 구하기 위한 용사의 여정 Q-Learning왕비를 구하기 위한 용사의 여정 Q-Learning
왕비를 구하기 위한 용사의 여정 Q-LearningHyunjong Lee
 
Informe simulacion digital yolfred uzcategui - 25.242.800
Informe simulacion digital   yolfred uzcategui - 25.242.800Informe simulacion digital   yolfred uzcategui - 25.242.800
Informe simulacion digital yolfred uzcategui - 25.242.800Yolfred Uzcategui
 
C fub gab-definitivo_005_5
C   fub gab-definitivo_005_5C   fub gab-definitivo_005_5
C fub gab-definitivo_005_5Igornoliveira
 
Let´s Fight for Human Unintelligence
Let´s Fight for Human UnintelligenceLet´s Fight for Human Unintelligence
Let´s Fight for Human UnintelligenceRobin-Boris Kasper
 
Engg div trail_pl_trading_mis_working (1)
Engg div trail_pl_trading_mis_working (1)Engg div trail_pl_trading_mis_working (1)
Engg div trail_pl_trading_mis_working (1)ra16dhi
 
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...Francesco Mosconi
 

Semelhante a Finding the Needle in the IP Stack (20)

Eset infografia-social-media-day
Eset infografia-social-media-dayEset infografia-social-media-day
Eset infografia-social-media-day
 
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?
Vazamentos massivos nas redes sociais: quais medidas os usuários devem tomar?
 
Image pacman
Image pacmanImage pacman
Image pacman
 
Portuguese Home Language - 2006
Portuguese Home Language - 2006Portuguese Home Language - 2006
Portuguese Home Language - 2006
 
annual-report-2016
annual-report-2016annual-report-2016
annual-report-2016
 
How To Deliver a 5-Star Experience for IoT-Enabled Services
How To Deliver a 5-Star Experience for IoT-Enabled ServicesHow To Deliver a 5-Star Experience for IoT-Enabled Services
How To Deliver a 5-Star Experience for IoT-Enabled Services
 
Prova ANP 2012 Gabarito conhecimentos básicos
Prova ANP 2012 Gabarito conhecimentos básicosProva ANP 2012 Gabarito conhecimentos básicos
Prova ANP 2012 Gabarito conhecimentos básicos
 
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minarCespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar
 
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)
Cespe 2013 --inpi_-_nível_intermediário_-_gabarito_preli minar (1)
 
Counting Bits
Counting BitsCounting Bits
Counting Bits
 
Time mangement program questionnaire publication
Time mangement program questionnaire   publicationTime mangement program questionnaire   publication
Time mangement program questionnaire publication
 
Design is as good (or flawed) as the people who make it
Design is as good (or flawed) as the people who make itDesign is as good (or flawed) as the people who make it
Design is as good (or flawed) as the people who make it
 
A gab preliminar-aneel10_100_1
A   gab preliminar-aneel10_100_1A   gab preliminar-aneel10_100_1
A gab preliminar-aneel10_100_1
 
왕비를 구하기 위한 용사의 여정 Q-Learning
왕비를 구하기 위한 용사의 여정 Q-Learning왕비를 구하기 위한 용사의 여정 Q-Learning
왕비를 구하기 위한 용사의 여정 Q-Learning
 
Informe simulacion digital yolfred uzcategui - 25.242.800
Informe simulacion digital   yolfred uzcategui - 25.242.800Informe simulacion digital   yolfred uzcategui - 25.242.800
Informe simulacion digital yolfred uzcategui - 25.242.800
 
C fub gab-definitivo_005_5
C   fub gab-definitivo_005_5C   fub gab-definitivo_005_5
C fub gab-definitivo_005_5
 
Let´s Fight for Human Unintelligence
Let´s Fight for Human UnintelligenceLet´s Fight for Human Unintelligence
Let´s Fight for Human Unintelligence
 
Engg div trail_pl_trading_mis_working (1)
Engg div trail_pl_trading_mis_working (1)Engg div trail_pl_trading_mis_working (1)
Engg div trail_pl_trading_mis_working (1)
 
Cdma basics
Cdma basicsCdma basics
Cdma basics
 
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
Introduction to Keras / Global Artificial Intelligence Conference / Santa Cla...
 

Mais de Sven Krasser

Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
Of Search Lights and Blind Spots: Machine Learning in CybersecurityOf Search Lights and Blind Spots: Machine Learning in Cybersecurity
Of Search Lights and Blind Spots: Machine Learning in CybersecuritySven Krasser
 
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...Sven Krasser
 
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...Sven Krasser
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information SecuritySven Krasser
 
A Sober Look at Machine Learning
A Sober Look at Machine LearningA Sober Look at Machine Learning
A Sober Look at Machine LearningSven Krasser
 

Mais de Sven Krasser (7)

Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
Of Search Lights and Blind Spots: Machine Learning in CybersecurityOf Search Lights and Blind Spots: Machine Learning in Cybersecurity
Of Search Lights and Blind Spots: Machine Learning in Cybersecurity
 
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...
Fundamentals of Machine Learning: Perspectives from a Data Scientist (ISC Wes...
 
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
 
Practical Machine Learning in Information Security
Practical Machine Learning in Information SecurityPractical Machine Learning in Information Security
Practical Machine Learning in Information Security
 
IJCNN 2017
IJCNN 2017IJCNN 2017
IJCNN 2017
 
AICS 2017
AICS 2017AICS 2017
AICS 2017
 
A Sober Look at Machine Learning
A Sober Look at Machine LearningA Sober Look at Machine Learning
A Sober Look at Machine Learning
 

Último

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Finding the Needle in the IP Stack

  • 1. Finding the Needle in the IP Stack Dr. Sven Krasser McAfee, Inc. Session ID: RR-403 Session Classification: Intermediate
  • 2. Agenda Data Mining – A Human Approach English Words Bad Behavior What’s in a File Conclusions 2
  • 3. Data Mining A Human Approach 3
  • 4. Anthropometric Data 4 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
  • 5. Measurements 5 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
  • 6. Measurements (continueD) 6 Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro
  • 7. 250 – 200 – 150 – 100 – Weight (in pounds) Height Versus Weight 60 65 70 75 80 Height (in inches) 7
  • 8. 250 – 200 – 150 – 100 – Women Weight (in pounds) Men 60 65 70 75 80 Height (in inches) Height Versus Weight (continued) 8
  • 9. Putting Weight and Height Into Perspective 9
  • 10. Best Guess for Gender 100% male 0% female 50% male 50% female Weight (in pounds) Best Guess 0% male 100% female Height (in inches) 10
  • 11. One Dimension Only 0.15 – 0.10 – 0.05 – 0.00 – 55 60 65 70 75 Height (in inches) 11
  • 12. Better Features 200 – 180 – 160 – 140 – 120 – 100 – Weight (in pounds) 800 900 1000 1100 1200 Buttock Circumference: “The circumference of the body measured at the level of the maximum posterior protuberance of the buttocks.” 12
  • 13. Best Guess for Revised Features 13 Weight (in pounds) Best Guess Buttock Circumference
  • 14. Further Improving the Separation Signal to Noise Features with very different distribution per class Correlation Features with low correlation Dimensionality Consider more features at the same time 14
  • 15. Email Data in Three Dimensions 15
  • 16. 16 Sparse Data 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 1 0 3 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 17. FinalVerdict + Classification Algorithms Decision Trees Decision Forests Support Vector Machines Neural Networks 17
  • 18. English Words And why do they look English? 18
  • 19. Some English Words militate caterwaul deracinate arrant concinnity imprecation vertiginous profuse 19
  • 20. Some English Explanations militate: to have force or influence caterwaul: to make a harsh cry or screech deracinate: to uproot arrant: outright; thoroughgoing concinnity: elegance – used chiefly of literary style imprecation: a curse vertiginous: causing dizziness; also, giddy; dizzy profuse: plentiful; copious 20 Source: http://dictionary.reference.com/
  • 22. Active .com Domains 22 82 million active .com domains
  • 23. Markov Chains .0073 .0641 .0213 .0912 .0912 .0732 .0014 .2175 .0143 .2626 .0301 .0939 .0322 .2419 .3598 .1457 .0633 .1064 .0588 .1733 .0872 .2738 .0431 .1534 .0932 .0714 .2936 .0437 .1860 .0196 .0371 .0291 .1932 .1120 .1269 .0411 .4759 .2979 ab bn nk ko of fp pu nj ja fe er rr ry yl li ne es eb ba ay un in Analysis of recent domain registrations Using Second Order Markov Chains to detect potentially malicious domain names bnkofpunjab is not legitimate ferrylines.com is legitimate ebay.com is not determinable 23
  • 24. Limitations of the Markov model Useful to detect malicious domain names Very effective for randomly generated names Detects some legitimate domain names as malicious domains Malicious names similar to legitimate ones (e.g. ebay.com phishing sites) International domain names and punycode Solution: add DNS related features into classification process 24
  • 25. DNS Features The number of the nameservers that hosted or are hosting this domain The average time of one nameserver to host this domain The maximum time of one nameserver to host this domain The minimum time of one nameserver to host this domain The number of non-activated nameservers that hosted this domain before Whether the domain is an international one 25
  • 26. 0.15 – 0.10 – 0.05 – 0.00 – Example Feature Density 0 200 400 600 Time of domain on name server (in days) 26
  • 27. 27 Results Analysis True Positive Rate False Positive Rate 27
  • 28. Bad Behavior Email and Spam 28
  • 29. IP Blacklist Lookup Mail server looks up sender IP over DNS Simple classifier modeled on IP blacklist query logs Narrow data set – queried IP, source IP, timestamp Deep data set – billions of query records monthly More complex data can be included 29
  • 30. Q? Q=x Q? Q=x IP Lookups Sender Receiver DNS Reputation server <Q, S, T> IP=S IP=Q 30
  • 31.
  • 33. Burstiness (data transmitted in short, uneven spurts)
  • 34. Sending sessions to individual recipients
  • 35.
  • 36. Average and standard deviation of low-frequency discrete Fourier transform (DFT coefficients)
  • 37.
  • 38.
  • 40. Local time at sender and receiver
  • 43.
  • 44. Number of “from” domains handled
  • 48. Extended HELO (EHLO) strings (millions)
  • 50.
  • 51. What’s in a File A Look at Image Spam and Malware 34
  • 55. Close-Up of Gradient (continued) 38
  • 56. Gradient Field of Photo 39
  • 58. Image Feature Analysis 1:0 2:266 3:285 4:0.933333 5:9678 6:7.83323 7:1 8:0 9:0.038768 10:0.0286506 11:0.0242844 12:12.9656 13:0.688315 14:0.688289 15:0.688927 16:0.688345 17:1.47216 18:1.48728 19:1.45537 20:1.4721 21:0.998652 22:0.998907 23:0.998662 24:1 25:1 26:1 27:1 28:1 29:1 30:1 31:1 32:1 33:1 34:1 35:1 36:1 37:1 38:1 39:1 40:1 41:1 42:1 43:1 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:1 52:1 53:1 54:1 55:1 56:1 57:1 58:1 59:1 60:62895.6 61:62894.4 62:62923.5 63:62897 64:11.9708 65:0.439338 66:0.0768368 67:0.0533835 68:0.694764 69:285 70:97 71:106 72:99 73:97 74:69979 75:69484 76:68665 77:69365 78:1 79:0 80:0 81:0.0342435 82:0.0281361 83:0.025709 84:1327.37 85:35.0028 86:28.6605 87:0.818808 88:1 89:2.98484e+07 90:4.16282e+06 91:8.01424e+06 92:1.49028e+07 93:3.56203e+09 94:7.21651e+06 95:4.73602e+06 96:3.10232e+07 97:0.0083796 98:0.576846 99:1.69219 100:0.480375 101:3.61226e+09 102:3.74413e+07 103:1.22301e+07 104:1.17737e+07 105:3.6044e+07 106:3.47745e+09 1:0 2:403 3:328 4:1.22866 5:14076 6:9.39074 7:1 8:0 9:0.0107123 10:0.00245869 11:0.00118774 12:8.11821 13:0.437548 14:0.43765 15:0.437561 16:0.437535 17:1.50918 18:1.49392 19:1.50991 20:1.50827 21:0.487349 22:3.32315e-05 23:9.95995e-05 24:2 25:1 26:4 27:2 28:1 29:4 30:2 31:1 32:4 33:2 34:1 35:4 36:2 37:1 38:4 39:2 40:1 41:4 42:2 43:1 44:4 45:2 46:1 47:4 48:2 49:1 50:4 51:2 52:1 53:4 54:2 55:1 56:4 57:2 58:1 59:4 60:87436.3 61:87446.5 62:87437.6 63:87435 64:21.4308 65:0.770517 66:0.0444456 67:0.0244281 68:0.549617 69:328 70:98 71:98 72:103 73:90 74:105800 75:99639 76:109102 77:104674 78:1 79:0 80:0 81:0.00520487 82:0.00256461 83:0.00166435 84:771.479 85:20.5683 86:47.573 87:2.31293 88:1 89:1.2547e+07 90:1.11096e+06 91:3.35713e+06 92:4.41541e+06 93:2.70918e+09 94:2.06067e+06 95:2.66906e+06 96:1.28006e+07 97:0.0046313 98:0.539126 99:1.2578 100:0.344938 101:2.72749e+09 102:1.02016e+07 103:1.04445e+07 104:1.03338e+07 105:1.00934e+07 106:2.69858e+09 1:0 2:418 3:320 4:1.30625 5:18652 6:7.17135 7:1 8:0 9:0.0106459 10:0.00264653 11:0.000994318 12:14.1862 13:0.243456 14:0.243497 15:0.243457 16:0.243446 17:2.41721 18:2.4152 19:2.41193 20:2.41671 21:7.91675e-05 22:8.63708e-05 23:0.339384 24:4 25:1 26:8 27:3 28:1 29:8 30:2 31:1 32:8 33:4 34:1 35:8 36:3 37:1 38:8 39:2 40:1 41:8 42:4 43:1 44:8 45:3 46:1 47:8 48:2 49:1 50:8 51:4 52:1 53:8 54:3 55:1 56:8 57:2 58:1 59:8 60:65998.9 61:66004.4 62:65999 63:65997.5 64:10.224 65:0.127104 66:0.0635766 67:0.056407 68:0.88723 69:320 70:53 71:48 72:57 73:57 74:111983 75:115960 76:114435 77:113875 78:1 79:0 80:0 81:0.006407 82:0.00189145 83:0.000485945 84:964.421 85:33.207 86:64.7237 87:1.9491 88:1 89:1.76351e+07 90:2.50429e+06 91:6.24028e+06 92:1.09962e+07 93:3.00335e+09 94:3.5386e+06 95:5.21808e+06 96:1.85759e+07 97:0.00587181 98:0.707707 99:1.1959 100:0.591959 101:3.02005e+09 102:2.17951e+07 103:2.6213e+07 104:2.59369e+07 105:2.15655e+07 106:2.98824e+09 1:0 2:425 3:213 4:1.99531 5:0 6:inf 7:1 8:0 9:0.0204143 10:0.0121072 11:0.00813035 12:14.5448 13:0.574197 14:0.562077 15:0.0938837 16:0.106849 17:2.52864 18:2.29707 19:5.7086 20:5.11698 21:0.0739991 22:0.95797 23:0.951505 24:1 25:1 26:1 27:1 28:1 29:1 30:1 31:1 32:1 33:2 34:1 35:2 36:1 37:1 38:2 39:1 40:1 41:2 42:2 43:5 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:3 52:3.66667 53:5 54:1 55:1 56:5 57:1 58:1 59:3 60:68596 61:67868.2 62:27737.3 63:29590.7 64:11.4527 65:1.08368 66:0.077625 67:0.0372273 68:0.479579 69:213 70:256 71:256 72:256 73:255 74:83329 75:78194 76:72107 77:77795 78:0 79:1 80:0 81:0.0200608 82:0.0118089 83:0.00857222 84:1814.96 85:43.1429 86:37.0588 87:0.858977 88:1 89:3.50206e+07 90:3.97185e+06 91:7.57905e+06 92:1.92885e+07 93:2.92089e+09 94:5.71381e+06 95:5.4605e+06 96:3.81577e+07 97:0.0119897 98:0.695132 99:1.38798 100:0.505495 101:2.99697e+09 102:2.84841e+07 103:1.06295e+07 104:1.04169e+07 105:2.79142e+07 106:2.93701e+09 1:0 2:345 3:328 4:1.05183 5:12654 6:8.94263 7:1 8:0 9:0.197119 10:0.144919 11:0.130974 12:16.5426 13:0.213558 14:0.213561 15:0.213558 16:0.213541 17:2.58033 18:2.58009 19:2.58045 20:2.57963 21:0.00235566 22:8.63563e-05 23:8.24988e-05 24:5 25:1 26:10 27:4 28:1 29:10 30:2 31:1 32:10 33:5 34:1 35:10 36:4 37:1 38:10 39:2 40:1 41:10 42:4 43:1.25 44:9 45:3 46:1.33333 47:8 48:2 49:1 50:8 51:5 52:1 53:10 54:4 55:1 56:10 57:2 58:1 59:10 60:52293.8 61:52294.2 62:52293.9 63:52291.8 64:10.1244 65:0.115826 66:0.0834305 67:0.0747702 68:0.896197 69:328 70:171 71:154 72:169 73:129 74:16728 75:14297 76:14292 77:15012 78:1 79:0 80:0 81:0.167754 82:0.150486 83:0.14035 84:1517.35 85:38.3333 86:65.9516 87:1.72048 88:1 89:2.79228e+07 90:3.07939e+06 91:6.79947e+06 92:1.53908e+07 93:1.22e+08 94:5.30236e+06 95:5.54061e+06 96:2.88332e+07 97:0.228875 98:0.580758 99:1.22721 100:0.533785 101:1.49517e+08 102:3.08441e+07 103:3.45075e+07 104:2.88255e+07 105:2.57652e+07 106:1.24897e+08 41
  • 59. View of two-dimensional subspace Image Feature Analysis Ham Spam 42
  • 62. Conclusions Heuristics are limited Mathematical descriptions Dimensionality Intuition 45
  • 63. http://www.trustedsource.org/en/resources/publications March 4, 2010 TrustedSource Data Mining Technologies 46 Research Publications 46

Notas do Editor

  1. Animated GIF, view in presentation mode
  2. Animated GIF, view in presentation mode
  3. Animated GIF, view in presentation mode