SlideShare uma empresa Scribd logo
1 de 5
Metafeature Training for Anti-Adversarial Content Filtering
                                                                Chad Mills
                                                                Microsoft
                                                            One Microsoft Way
                                                            Redmond, WA USA
                                                         cmills@microsoft.com

                                                                      of pre-classified documents and breaking them down into
Abstract                                                              document attributes (e.g. words), a machine learning classifier
  Many spammers are successful at avoiding detection by               learns which attributes tend to appear in that class of documents
content filters. Spam filters look at which message attributes        and which do not. These tendencies are then used to predict
tend to have appeared in spam and non-spam messages in the            whether a future example falls within the class.
past and use that to predict whether or not an incoming message         This technique has been applied to spam filtering [5, 9]. A set
is spam. Naïve Bayes and other probabilistic machine learning         of emails are classified as spam or non-spam, and those messages
algorithms have received a lot of attention for detecting spam in     are broken down into features (email attributes). This set of
this manner. One weakness in this approach is that spammers           accurately classified messages serves as the training set. When
can use test messages to identify content the filter associates       new messages are received, they can be run through a spam filter
with non-spam messages and add this good content to their             which predicts—based on the training output—whether or not
spam message until it avoids detection.                               the message is spam. A spam classification can lead to a variety
  In our experience, normal messages frequently have many             of consequences, ranging from rejecting the mail to placing it in a
words that are mild indicators of a spam or non-spam message,         junk folder or appending a spam tag to the subject.
but these spam messages crafted to avoid detection tend to have
                                                                      2.2         Related Work
some very spammy attributes offset by many very good
                                                                        Dalvi identified the adversarial nature of the spam problem and
attributes. Metafeatures provides a way to detect this by
                                                                      provided recommendations on how to make classifiers more
analyzing not just whether a message’s attributes tend to have
                                                                      robust to adversarial attacks [4]. While this has a purpose similar
appeared in good or spam messages in the past, but whether the
                                                                      to metafeatures, Dalvi’s approach was to make a single classifier
distribution of weights associated with a message’s attributes
                                                                      more robust while metafeatures adds a second classifier to
tends to represent a good or spam message.
                                                                      detect the adversary’s manipulation of the first classifier.
                                                                        Bennett started with multiple distinct text classifiers, and
1    Introduction                                                     combined them using an additional training run which included
  Spam is still a big problem [8, 10].                                the output of the classifiers along with reliability indicators [1, 2,
  Some spammers understand the nature of content filters that         3]. These reliability indicators were features such as document
attempt to filter their messages and actively try to work around      length, the percentage of words in a document that weren’t in
the filter, while others send spam without consideration for the      the training set, voting features for the base classifiers, and
spam-filtering technologies used by the recipient. Content filters    statistical measures of the sensitivities of the base filters [2]. The
are relatively effective against the spammers not specifically        purpose of these reliability indicators was to provide document
crafting their messages to avoid detection. On the other hand,        context which would indicate which filters were best at classifying
many spammers with a deep understanding of spam filters are           a particular type of document. This could outperform a voting
able to consistently get their messages past the filters.             strategy by trusting each of the classifiers on the particular type
  Metafeatures attempts to identify spam messages crafted to          of documents they’re effective at classifying.
avoid detection by content filters.                                     This research was on the text classification problem rather than
                                                                      spam filtering, and the purpose was to determine which of
2    Prior Work                                                       several base classifiers was more trustworthy on a given
                                                                      document. However, the use of a meta-classifier with context
2.1      Text Classification and Spam Applications                    information about the base classifier has parallels with
  Text classifiers attempt to determine whether or not a
                                                                      metafeatures.
document falls within a predefined class. By looking at a sample
3     Adversarial Challenges                                               filter. With a borderline message just barely getting caught,
                                                                           spammers can then add potential good words to the message
3.1        Adversaries Make Spam Classification Hard                       and determine whether or not the message with the added
  While spam classification is similar to the classic text                 words gets through. In the event the message with added
classification problem in many respects, spam classification is            content gets past the filter, the added word(s) can be regarded as
particularly difficult because there is an adversary attempting to         good by the spam filter [7].
create a document which belongs to the spam class but which the               Spammers find these words the spam filter regards as good,
classifier will fail to classify correctly. When using a text classifier   and proceed to add them to a spam message until the amount of
to classify news stories into topics, by contrast, one would not           good content overpowers the spam content and the message
need to be very concerned about reporters crafting their science           gets past the filter [6].
articles in an attempt to avoid getting the article classified in the
science category. Spammers, on the other hand, have financial              3.4         Opportunity to Detect Spammers
                                                                             By attempting to deceive the spam filter, spammers artificially
motivation for getting their messages get classified incorrectly.
                                                                           alter the content of their message by adding enough good words
3.2       Traditional Spam Filters                                         that any spam content is overpowered and the message avoids
  Traditional spam filters based on probabilistic machine learning         detection.       The message feature scores, when combined
models start with a training set of messages classified as good or         together, end up indicating the message is good. However, the
spam. These messages are broken down into component                        artificial distribution of scores provides an opportunity to detect
attributes, or features. Machine learning algorithms like Naïve            the difference between a well-crafted spam message and a
Bayes assign numerical weights to these features which indicate,           regular non-spam message.
in the context of the weights given to other features, how likely a
message containing that feature is to be spam or non-spam.                 4     Metafeatures
Predictive attributes are given weights with a large magnitude,
and opposite signs are used to indicate whether the attribute is           4.1        Target: Adversaries
predictive of a spam or non-spam message. Features which are                 As discussed in 3.3, many spammers craft messages specifically
not very predictive are assigned weights with a small magnitude.           to avoid detection by a content filter. Metafeatures specifically
                                                                           targets these messages. Spam sent not meeting these criteria,
3.3        The Adversaries                                                 including naïve spam (spam not crafted to work around the filter)
   The application of these training algorithms to the spam                and gray mail (messages some recipients consider good and
problem makes the assumption that words previously appearing               others consider spam), are not intended to be filtered better by
in good mails will continue to appear in good mails, and words             metafeatures. The content filter itself should be effective on
previously appearing in spam will continue to appear in spam.              much of the spam not attempting to work around the filter, and
This is what spammers exploit by identifying "good words” and              reputation should be helpful on gray mail. Since metafeatures
add them as “chaff” into spam messages to avoid detection [7].             depends on a content filter and isn’t expected to help
   The technique described herein is targeted specifically toward          significantly with gray mail, it should not be used alone but rather
linear classifiers, but could be applied more broadly as well. With        in conjunction with other technologies in order to effectively stop
a quadratic classifier, for example, the spammer could add large           spam.
groups of words instead of individual words, or find individual
good words and then through a series of tests eliminate pairs of           4.2        Conceptual Overview
these good words that cause a borderline good message to get                 In a traditional machine-learning-based spam filter (Figure 1), a
caught. The specific technique used to find good content is not            set of training messages are first classified as spam or non-spam.
as important as the result, where spammers add known good                  Each message is broken down into component features, and
content to a spam message in order to avoid detection.                     training assigns numerical weights to a set of those features.
   To find good words, spammers could start with a good message            When a new message is to be evaluated, the message is broken
that does not get caught by the spam filter. It may be challenging         down into its component features and the corresponding weights
to get messages with spammy content delivered, but spammers                are determined from the training output. These weights are then
can consult their own inboxes to find legitimate messages which            combined (e.g. by adding them together) to form a score for the
do not get filtered. The spammer can then slowly add known                 message. Based on this score, an action such as moving the
spammy content (e.g. the word “viagra”) to the message until it is         message to the junk folder may be taken for messages meeting a
caught. With widely available products like Spam Assassin or free          predefined threshold.
mail providers like Hotmail and Yahoo, spammers can easily
determine whether or not a test message is getting caught by the
Figure 1: Traditional spam content filter

   Metafeatures builds on this system and attempts to detect
messages crafted to work around the filter by identifying
messages with distributions of weights which do not match those
of legitimate messages.
   When using metafeatures (Figure 2), several steps are added to
this process. After training results in weights being assigned to a
set of features, metafeatures are computed for each training
message. Metafeatures may include a sum of weights, standard           Figure 2: Content filter with metafeatures
deviation of weights, number of features, etc. These features
characterize the spam filter’s perspective on the message. The
sum of weights is analogous to the output of a traditional spam
filter, and other statistical metafeatures describe the distribution   5    Experiment
of weights assigned to the message’s features. After the                 This section describes the experimental results of applying
metafeatures are computed, another training run assigns these          metafeatures to a large set of classified messages. First the data
metafeatures weights. Since this is another training run, it is        and details on the experimental setup are introduced, followed
possible to add metafeatures that are not statistical descriptions     by experimental results including a comparative ROC curve.
of the first training run’s output, including the original features,
                                                                       5.1       Experimental Setup
reputation data, etc. However, since the primary purpose of
                                                                         This section provides details on the system used to evaluate
metafeatures is to detect spammers working around the filter,
                                                                       metafeatures. This includes the data used for training and
these alternate configurations and their relative performances
                                                                       evaluation and the specific metafeatures used.
are not discussed to simplify this discussion.
   When a message is received which needs to be evaluated, it is       5.1.1     Dataset
broken down into features and their weights are determined               The Hotmail Feedback Loop is a set of messages collected by
from the output of the first training run, as before. Then, the        asking over 100,000 Hotmail volunteers daily to classify a
metafeatures are calculated for the message. The metafeature           message addressed to them as good or spam. For more details
weights are determined from the output of the second training          on this system, see [11]. The training set consisted of 1,800,000
run, and are then combined to arrive at a final score for the          messages received in March 2007 through May 20, 2007. The
message.                                                               evaluation set consisted of 50,000 messages classified on May 21,
                                                                       2007, after the last of the training messages had been received.

                                                                       5.1.2      Feature Placement
                                                                         The main purpose of metafeatures is to detect spammers
                                                                       altering their message to avoid detection. This suggests that the
                                                                       most important metafeatures are those related to the statistical
                                                                       distribution of feature weights from the first training run. Only
                                                                       content features were used in the first training run to allow the
statistical metafeatures, such as standard deviation, a clean set of   catch only 38% of all spam, but rather that this was the
content feature weights for these statistical calculations.            improvement on what was not already being caught by the base
  Since metafeatures includes two training runs, many other            content filtering system.
configurations are possible. Some features may be placed at the
base feature level and others at the metafeature level. Some           5.2.2      Qualitative
features may be trained on at both the base and metafeature              Qualitatively, examining the unique catches of the metafeature
levels, or placed exclusively on the metafeature level even if they    system reveals that the primary source of improvement was on
do not involve calculations on the output of the first training run.   messages with significant amounts of good word chaff. There
                                                                       were no significant improvements on the filtering of gray mail or
5.1.3    Example Metafeatures                                          spam that did not appear crafted in an attempt to work around
  After the first content filter training run, the following           the filter. This confirms that metafeatures worked as expected
metafeatures are calculated. These may include, for example:           and performed well against adversarial attacks.
         Maximum feature weight
         Minimum feature weight                                        5.2.3      The Spammers’ Response
                                                                         If this were deployed in a live setting, rather than being run
         Average feature weight
                                                                       against a corpus of classified mail, spammers would attempt to
         Sum of feature weights
                                                                       work around metafeatures as well. However, this would be much
         Standard deviation of feature weights
                                                                       more difficult than working around a traditional content filter
         Number of features
                                                                       since the spammer cannot just search for good words to add and
         Percent of features with spammy weights
                                                                       overpower any spammy content the filter detects.             The
         Number of very spammy weights
                                                                       manipulation is what allows the message to be detected as spam.
         Number of very good weights
                                                                       Spam messages by definition have spammy content, so
         Percent of out of vocabulary features
                                                                       metafeatures makes it difficult for spammers to overcome this by
         Percent of words appearing in a dictionary                    simply adding a lot of good words to offset the spam being
         Pairs of other metafeatures                                   detected.
                                                                         Additionally, with traditional content filters spammers know
5.2       Results
                                                                       that some message parts have been detected as spammy and
5.2.1     Quantitative                                                 adding enough good words will get the message delivered. When
  Training was performed on the training set and the output was        a message is being caught due to the standard deviation of
applied to the evaluation set. This was contrasted against the         weights or the upper quartile, it is much more difficult for the
regular content filter alone without metafeatures, using the same      spammer to understand what is causing the message to be
training and evaluation sets. An ROC curve comparing the               caught. Furthermore, it is more difficult for spammers to
performance of the two filters is shown in Figure 3.                   manipulate metafeatures than it is to manipulate the features
                                                                       themselves. A spammer can easily add features by adding new
                                                                       words to the message, but the number of metafeatures is fixed
                                                                       and they are calculated on the message as a whole, so
                                                                       manipulating the features is easier than manipulating the
                                                                       metafeatures

                                                                       6    Conclusions
                                                                         Spam content filters are generally effective, but spammers are
                                                                       still able to craft messages specifically to work around them. In
                                                                       particular, they can add words the filter regards as good to a
                                                                       spam message until it avoids detection. While normal messages
                                                                       have many words which are not very predictive of good or spam
Figure 3: Metafeatures, Regular Comparative ROC
                                                                       messages, these specially crafted messages tend to have some
   The filter with metafeatures clearly outperformed the regular       very spammy words offset by many very good words.
content filter. At a similar false positive threshold as the one       Metafeatures provides a way to detect spammers working
used on the regular content filter at Hotmail, metafeatures            around the filter in this manner by observing the abnormal
resulted in 38.2% less spam getting through to the users’ inboxes.     distribution of feature weights.
It should be noted that this does not mean metafeatures could
7    Acknowledgements                                                   [5] P. Graham. A plan for spam.
                                                                             http://www.paulgraham.com/spam.html, 2002.
  Thanks to Geoff Hulten and Harry Katz for their comments on a
                                                                        [6] J. Graham-Cumming. How to beat an adaptive spam
draft of this paper.
                                                                             filter. MIT Spam Conference, January 2004, Cambridge,
                                                                             MA.
8    References
                                                                        [7] D. Lowd and C. Meek. Good word attacks on statistical
    [1] P. Bennett. Building reliable metaclassifiers for text
                                                                             spam filters. Second Conference on Email and Anti-
        learning. May 2006. CMU-CS-06-121, Ph.D. Thesis,
                                                                             Spam, July 2005, Palo Alto, CA.
        Carnegie-Mellon University.
                                                                        [8] C. Perrin. The truth about e-mail spam. ZDNet, February
    [2] P. Bennett, S. Dumais, and E. Horvitz. The combination
                                                                             2008.
        of text classifiers using reliability indicators. Information
                                                                        [9] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A
        Retrieval, 8-(1):67-100, 2005.
                                                                             Bayesian approach to filtering junk email. AAAI
    [3] P. Bennett, S. Dumais, and E. Horvitz. Probabilistic
                                                                             Workshop on Learning for Text Categorization, July
        combination of text classifiers using reliability
                                                                             1998, Madison, Wisconsin.
        indicators: models and results. Proceedings of SIGIR ’02,
                                                                        [10] SDA Asia Magazine. Spam is 90 percent of all email.
        2002.
                                                                             February 2007.
    [4] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D.
                                                                        [11] W. Yih, J. Goodman, and G. Hulten. Learning at low
        Verma. Adversarial classification. Proceedings of the
                                                                             false positive rates. Third Conference on Email and Anti-
        Tenth ACM SIGKDD International Conference on
                                                                             Spam, July 2006, Mountain View, CA.
        Knowledge Discovery and Data Mining, August 2004,
        Seattle, WA.

Mais conteúdo relacionado

Mais procurados

SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
Reza Rahimi
 
A Comparative Study for SMS Spam Detection
A Comparative Study for SMS Spam DetectionA Comparative Study for SMS Spam Detection
A Comparative Study for SMS Spam Detection
ijtsrd
 

Mais procurados (15)

SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
SPAM FILTERING SECURITY EVALUATION FRAMEWORK USING SVM, LR AND MILR
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
 
B0940509
B0940509B0940509
B0940509
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
A Comparative Study for SMS Spam Detection
A Comparative Study for SMS Spam DetectionA Comparative Study for SMS Spam Detection
A Comparative Study for SMS Spam Detection
 
Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam  Differential evolution detection models for SMS spam
Differential evolution detection models for SMS spam
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
Spam Detection Using Natural Language processing
Spam Detection Using Natural Language processingSpam Detection Using Natural Language processing
Spam Detection Using Natural Language processing
 
Enabling Spam filtering
Enabling Spam filteringEnabling Spam filtering
Enabling Spam filtering
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
BlogPost
BlogPostBlogPost
BlogPost
 
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
 
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGFAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MINING
 

Destaque

Destaque (8)

Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommunTjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
 
ppt
pptppt
ppt
 
LaranEvansResume
LaranEvansResumeLaranEvansResume
LaranEvansResume
 
LE03.doc
LE03.docLE03.doc
LE03.doc
 
BEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPERBEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPER
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
mathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schoolsmathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schools
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...
 

Semelhante a Mills_Metafeatures.doc

final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptx
infotowards
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
Dhara Shah
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1
Dhara Shah
 

Semelhante a Mills_Metafeatures.doc (20)

Detection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine LearningDetection of Spam in Emails using Machine Learning
Detection of Spam in Emails using Machine Learning
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptx
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
 
Network paperthesis1
Network paperthesis1Network paperthesis1
Network paperthesis1
 
SPAM FILTERS
SPAM FILTERSSPAM FILTERS
SPAM FILTERS
 
Maximizing Email Deliverability - Accurate List Inc
Maximizing Email Deliverability - Accurate List IncMaximizing Email Deliverability - Accurate List Inc
Maximizing Email Deliverability - Accurate List Inc
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection system
 
A multi layer architecture for spam-detection system
A multi layer architecture for spam-detection systemA multi layer architecture for spam-detection system
A multi layer architecture for spam-detection system
 
SAS Text Mining
SAS Text MiningSAS Text Mining
SAS Text Mining
 
Jt3616901697
Jt3616901697Jt3616901697
Jt3616901697
 
Fighting spam using social gatekeepers
Fighting spam using social gatekeepersFighting spam using social gatekeepers
Fighting spam using social gatekeepers
 
Survey on spam filtering
Survey on spam filteringSurvey on spam filtering
Survey on spam filtering
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Mills_Metafeatures.doc

  • 1. Metafeature Training for Anti-Adversarial Content Filtering Chad Mills Microsoft One Microsoft Way Redmond, WA USA cmills@microsoft.com of pre-classified documents and breaking them down into Abstract document attributes (e.g. words), a machine learning classifier Many spammers are successful at avoiding detection by learns which attributes tend to appear in that class of documents content filters. Spam filters look at which message attributes and which do not. These tendencies are then used to predict tend to have appeared in spam and non-spam messages in the whether a future example falls within the class. past and use that to predict whether or not an incoming message This technique has been applied to spam filtering [5, 9]. A set is spam. Naïve Bayes and other probabilistic machine learning of emails are classified as spam or non-spam, and those messages algorithms have received a lot of attention for detecting spam in are broken down into features (email attributes). This set of this manner. One weakness in this approach is that spammers accurately classified messages serves as the training set. When can use test messages to identify content the filter associates new messages are received, they can be run through a spam filter with non-spam messages and add this good content to their which predicts—based on the training output—whether or not spam message until it avoids detection. the message is spam. A spam classification can lead to a variety In our experience, normal messages frequently have many of consequences, ranging from rejecting the mail to placing it in a words that are mild indicators of a spam or non-spam message, junk folder or appending a spam tag to the subject. but these spam messages crafted to avoid detection tend to have 2.2 Related Work some very spammy attributes offset by many very good Dalvi identified the adversarial nature of the spam problem and attributes. Metafeatures provides a way to detect this by provided recommendations on how to make classifiers more analyzing not just whether a message’s attributes tend to have robust to adversarial attacks [4]. While this has a purpose similar appeared in good or spam messages in the past, but whether the to metafeatures, Dalvi’s approach was to make a single classifier distribution of weights associated with a message’s attributes more robust while metafeatures adds a second classifier to tends to represent a good or spam message. detect the adversary’s manipulation of the first classifier. Bennett started with multiple distinct text classifiers, and 1 Introduction combined them using an additional training run which included Spam is still a big problem [8, 10]. the output of the classifiers along with reliability indicators [1, 2, Some spammers understand the nature of content filters that 3]. These reliability indicators were features such as document attempt to filter their messages and actively try to work around length, the percentage of words in a document that weren’t in the filter, while others send spam without consideration for the the training set, voting features for the base classifiers, and spam-filtering technologies used by the recipient. Content filters statistical measures of the sensitivities of the base filters [2]. The are relatively effective against the spammers not specifically purpose of these reliability indicators was to provide document crafting their messages to avoid detection. On the other hand, context which would indicate which filters were best at classifying many spammers with a deep understanding of spam filters are a particular type of document. This could outperform a voting able to consistently get their messages past the filters. strategy by trusting each of the classifiers on the particular type Metafeatures attempts to identify spam messages crafted to of documents they’re effective at classifying. avoid detection by content filters. This research was on the text classification problem rather than spam filtering, and the purpose was to determine which of 2 Prior Work several base classifiers was more trustworthy on a given document. However, the use of a meta-classifier with context 2.1 Text Classification and Spam Applications information about the base classifier has parallels with Text classifiers attempt to determine whether or not a metafeatures. document falls within a predefined class. By looking at a sample
  • 2. 3 Adversarial Challenges filter. With a borderline message just barely getting caught, spammers can then add potential good words to the message 3.1 Adversaries Make Spam Classification Hard and determine whether or not the message with the added While spam classification is similar to the classic text words gets through. In the event the message with added classification problem in many respects, spam classification is content gets past the filter, the added word(s) can be regarded as particularly difficult because there is an adversary attempting to good by the spam filter [7]. create a document which belongs to the spam class but which the Spammers find these words the spam filter regards as good, classifier will fail to classify correctly. When using a text classifier and proceed to add them to a spam message until the amount of to classify news stories into topics, by contrast, one would not good content overpowers the spam content and the message need to be very concerned about reporters crafting their science gets past the filter [6]. articles in an attempt to avoid getting the article classified in the science category. Spammers, on the other hand, have financial 3.4 Opportunity to Detect Spammers By attempting to deceive the spam filter, spammers artificially motivation for getting their messages get classified incorrectly. alter the content of their message by adding enough good words 3.2 Traditional Spam Filters that any spam content is overpowered and the message avoids Traditional spam filters based on probabilistic machine learning detection. The message feature scores, when combined models start with a training set of messages classified as good or together, end up indicating the message is good. However, the spam. These messages are broken down into component artificial distribution of scores provides an opportunity to detect attributes, or features. Machine learning algorithms like Naïve the difference between a well-crafted spam message and a Bayes assign numerical weights to these features which indicate, regular non-spam message. in the context of the weights given to other features, how likely a message containing that feature is to be spam or non-spam. 4 Metafeatures Predictive attributes are given weights with a large magnitude, and opposite signs are used to indicate whether the attribute is 4.1 Target: Adversaries predictive of a spam or non-spam message. Features which are As discussed in 3.3, many spammers craft messages specifically not very predictive are assigned weights with a small magnitude. to avoid detection by a content filter. Metafeatures specifically targets these messages. Spam sent not meeting these criteria, 3.3 The Adversaries including naïve spam (spam not crafted to work around the filter) The application of these training algorithms to the spam and gray mail (messages some recipients consider good and problem makes the assumption that words previously appearing others consider spam), are not intended to be filtered better by in good mails will continue to appear in good mails, and words metafeatures. The content filter itself should be effective on previously appearing in spam will continue to appear in spam. much of the spam not attempting to work around the filter, and This is what spammers exploit by identifying "good words” and reputation should be helpful on gray mail. Since metafeatures add them as “chaff” into spam messages to avoid detection [7]. depends on a content filter and isn’t expected to help The technique described herein is targeted specifically toward significantly with gray mail, it should not be used alone but rather linear classifiers, but could be applied more broadly as well. With in conjunction with other technologies in order to effectively stop a quadratic classifier, for example, the spammer could add large spam. groups of words instead of individual words, or find individual good words and then through a series of tests eliminate pairs of 4.2 Conceptual Overview these good words that cause a borderline good message to get In a traditional machine-learning-based spam filter (Figure 1), a caught. The specific technique used to find good content is not set of training messages are first classified as spam or non-spam. as important as the result, where spammers add known good Each message is broken down into component features, and content to a spam message in order to avoid detection. training assigns numerical weights to a set of those features. To find good words, spammers could start with a good message When a new message is to be evaluated, the message is broken that does not get caught by the spam filter. It may be challenging down into its component features and the corresponding weights to get messages with spammy content delivered, but spammers are determined from the training output. These weights are then can consult their own inboxes to find legitimate messages which combined (e.g. by adding them together) to form a score for the do not get filtered. The spammer can then slowly add known message. Based on this score, an action such as moving the spammy content (e.g. the word “viagra”) to the message until it is message to the junk folder may be taken for messages meeting a caught. With widely available products like Spam Assassin or free predefined threshold. mail providers like Hotmail and Yahoo, spammers can easily determine whether or not a test message is getting caught by the
  • 3. Figure 1: Traditional spam content filter Metafeatures builds on this system and attempts to detect messages crafted to work around the filter by identifying messages with distributions of weights which do not match those of legitimate messages. When using metafeatures (Figure 2), several steps are added to this process. After training results in weights being assigned to a set of features, metafeatures are computed for each training message. Metafeatures may include a sum of weights, standard Figure 2: Content filter with metafeatures deviation of weights, number of features, etc. These features characterize the spam filter’s perspective on the message. The sum of weights is analogous to the output of a traditional spam filter, and other statistical metafeatures describe the distribution 5 Experiment of weights assigned to the message’s features. After the This section describes the experimental results of applying metafeatures are computed, another training run assigns these metafeatures to a large set of classified messages. First the data metafeatures weights. Since this is another training run, it is and details on the experimental setup are introduced, followed possible to add metafeatures that are not statistical descriptions by experimental results including a comparative ROC curve. of the first training run’s output, including the original features, 5.1 Experimental Setup reputation data, etc. However, since the primary purpose of This section provides details on the system used to evaluate metafeatures is to detect spammers working around the filter, metafeatures. This includes the data used for training and these alternate configurations and their relative performances evaluation and the specific metafeatures used. are not discussed to simplify this discussion. When a message is received which needs to be evaluated, it is 5.1.1 Dataset broken down into features and their weights are determined The Hotmail Feedback Loop is a set of messages collected by from the output of the first training run, as before. Then, the asking over 100,000 Hotmail volunteers daily to classify a metafeatures are calculated for the message. The metafeature message addressed to them as good or spam. For more details weights are determined from the output of the second training on this system, see [11]. The training set consisted of 1,800,000 run, and are then combined to arrive at a final score for the messages received in March 2007 through May 20, 2007. The message. evaluation set consisted of 50,000 messages classified on May 21, 2007, after the last of the training messages had been received. 5.1.2 Feature Placement The main purpose of metafeatures is to detect spammers altering their message to avoid detection. This suggests that the most important metafeatures are those related to the statistical distribution of feature weights from the first training run. Only content features were used in the first training run to allow the
  • 4. statistical metafeatures, such as standard deviation, a clean set of catch only 38% of all spam, but rather that this was the content feature weights for these statistical calculations. improvement on what was not already being caught by the base Since metafeatures includes two training runs, many other content filtering system. configurations are possible. Some features may be placed at the base feature level and others at the metafeature level. Some 5.2.2 Qualitative features may be trained on at both the base and metafeature Qualitatively, examining the unique catches of the metafeature levels, or placed exclusively on the metafeature level even if they system reveals that the primary source of improvement was on do not involve calculations on the output of the first training run. messages with significant amounts of good word chaff. There were no significant improvements on the filtering of gray mail or 5.1.3 Example Metafeatures spam that did not appear crafted in an attempt to work around After the first content filter training run, the following the filter. This confirms that metafeatures worked as expected metafeatures are calculated. These may include, for example: and performed well against adversarial attacks. Maximum feature weight Minimum feature weight 5.2.3 The Spammers’ Response If this were deployed in a live setting, rather than being run Average feature weight against a corpus of classified mail, spammers would attempt to Sum of feature weights work around metafeatures as well. However, this would be much Standard deviation of feature weights more difficult than working around a traditional content filter Number of features since the spammer cannot just search for good words to add and Percent of features with spammy weights overpower any spammy content the filter detects. The Number of very spammy weights manipulation is what allows the message to be detected as spam. Number of very good weights Spam messages by definition have spammy content, so Percent of out of vocabulary features metafeatures makes it difficult for spammers to overcome this by Percent of words appearing in a dictionary simply adding a lot of good words to offset the spam being Pairs of other metafeatures detected. Additionally, with traditional content filters spammers know 5.2 Results that some message parts have been detected as spammy and 5.2.1 Quantitative adding enough good words will get the message delivered. When Training was performed on the training set and the output was a message is being caught due to the standard deviation of applied to the evaluation set. This was contrasted against the weights or the upper quartile, it is much more difficult for the regular content filter alone without metafeatures, using the same spammer to understand what is causing the message to be training and evaluation sets. An ROC curve comparing the caught. Furthermore, it is more difficult for spammers to performance of the two filters is shown in Figure 3. manipulate metafeatures than it is to manipulate the features themselves. A spammer can easily add features by adding new words to the message, but the number of metafeatures is fixed and they are calculated on the message as a whole, so manipulating the features is easier than manipulating the metafeatures 6 Conclusions Spam content filters are generally effective, but spammers are still able to craft messages specifically to work around them. In particular, they can add words the filter regards as good to a spam message until it avoids detection. While normal messages have many words which are not very predictive of good or spam Figure 3: Metafeatures, Regular Comparative ROC messages, these specially crafted messages tend to have some The filter with metafeatures clearly outperformed the regular very spammy words offset by many very good words. content filter. At a similar false positive threshold as the one Metafeatures provides a way to detect spammers working used on the regular content filter at Hotmail, metafeatures around the filter in this manner by observing the abnormal resulted in 38.2% less spam getting through to the users’ inboxes. distribution of feature weights. It should be noted that this does not mean metafeatures could
  • 5. 7 Acknowledgements [5] P. Graham. A plan for spam. http://www.paulgraham.com/spam.html, 2002. Thanks to Geoff Hulten and Harry Katz for their comments on a [6] J. Graham-Cumming. How to beat an adaptive spam draft of this paper. filter. MIT Spam Conference, January 2004, Cambridge, MA. 8 References [7] D. Lowd and C. Meek. Good word attacks on statistical [1] P. Bennett. Building reliable metaclassifiers for text spam filters. Second Conference on Email and Anti- learning. May 2006. CMU-CS-06-121, Ph.D. Thesis, Spam, July 2005, Palo Alto, CA. Carnegie-Mellon University. [8] C. Perrin. The truth about e-mail spam. ZDNet, February [2] P. Bennett, S. Dumais, and E. Horvitz. The combination 2008. of text classifiers using reliability indicators. Information [9] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Retrieval, 8-(1):67-100, 2005. Bayesian approach to filtering junk email. AAAI [3] P. Bennett, S. Dumais, and E. Horvitz. Probabilistic Workshop on Learning for Text Categorization, July combination of text classifiers using reliability 1998, Madison, Wisconsin. indicators: models and results. Proceedings of SIGIR ’02, [10] SDA Asia Magazine. Spam is 90 percent of all email. 2002. February 2007. [4] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. [11] W. Yih, J. Goodman, and G. Hulten. Learning at low Verma. Adversarial classification. Proceedings of the false positive rates. Third Conference on Email and Anti- Tenth ACM SIGKDD International Conference on Spam, July 2006, Mountain View, CA. Knowledge Discovery and Data Mining, August 2004, Seattle, WA.