SlideShare uma empresa Scribd logo
1 de 14
 Kanchana Ihalagedara
 Rajitha Kithuldeniya
 Supun weerasekara
10/10/2015 Escape 2015 1
Supervised by Mr.Sampath Deegalla
Internet in Educational Institutes
Mainly for educational purposes.
What happens if users priority is not the
intended purpose.
Network congestions
Wastage of resources
Affects individual user performance
negatively
10/10/2015 Escape 2015 2
Blocking Web Sites in Proxy Server
Squid ACLs - Text file of blacklists
SquidGuard - External databases
DansGuardian - Content filter
10/10/2015 Escape 2015 3
World Wide Web is Growing
 Manually blacklisting web sites is impossible
 Related products are not updated with the growing
web
10/10/2015 Escape 2015 4
672,985,183 - 2013
968,882,453 - 2014
295,897,270
From www.internetlivestats.com
Dynamic automated method
 Automated web classification is required
 Machine Learning is used in automated web
classification
10/10/2015 Escape 2015 5
Over View of Our Solution
Copy client
request
Check URL
Get web
content
Classify web
content
10/10/2015 Escape 2015 6
Update
the
blacklist
Machine Learning in Web
Classification
 Several web classification researches can be found
 Frequently used algorithms
 Naïve Byes
 Support vector machine
 Nearest neighbor
 Classification requires a data set
 Set of URLs labeled as educational or non educational
10/10/2015 Escape 2015 7
Data Collection & Preprocessing
Preprocess
Squid
server log
Preprocess
DMOZ
data set
Create
labeled
URLs
Get web
content
Create
training
data set
10/10/2015 Escape 2015 8
Model Creation & Testing
 Four models were created from WEKA(small data set)
 Data set with two hundred records
 10 – fold cross validation for testing
Algorithm Accuracy(%)
PRISM 74.5
C4.5 (J48 in WEKA) 83.0
Naïve bayes 95.0
Support Vector Machines 95.5
10/10/2015 Escape 2015 9
Model Creation & Testing
 Three models using Python (larger dataset)
 Data set of 4000 records
 Separate data set of 1000 records for Testing
Algorithm Accuracy
Naïve Bayes multinomial 92.9%
SVC 77.5%
Linear SVC 98.9%
10/10/2015 Escape 2015 10
Feature Selection in Linear SVC
84
86
88
90
92
94
96
98
100
Accuracy/%
No. of features
10/10/2015 Escape 2015 11
Principal Component Analysis
10/10/2015 Escape 2015 12
Future Work
 Consider more content (Meta data)
 Other Languages (Sinhala)
 Image processing can be added
10/10/2015 Escape 2015 13
Thank You!
10/10/2015 Escape 2015 14

Mais conteúdo relacionado

Semelhante a Feasibility of Using Machine Learning to Access Control_revDS

Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0
Andrew Minkin
 
DAY1- DAY2Netweaver gateway
DAY1- DAY2Netweaver gatewayDAY1- DAY2Netweaver gateway
DAY1- DAY2Netweaver gateway
Gaurav Ahluwalia
 

Semelhante a Feasibility of Using Machine Learning to Access Control_revDS (20)

Case Study: Manheim Implements Test Data Management to Reduce Testing Time an...
Case Study: Manheim Implements Test Data Management to Reduce Testing Time an...Case Study: Manheim Implements Test Data Management to Reduce Testing Time an...
Case Study: Manheim Implements Test Data Management to Reduce Testing Time an...
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to Know
 
MySQL High Availibility Solutions
MySQL High Availibility SolutionsMySQL High Availibility Solutions
MySQL High Availibility Solutions
 
5. iED Cloud Services.pdf
5. iED Cloud Services.pdf5. iED Cloud Services.pdf
5. iED Cloud Services.pdf
 
DevOps and Splunk
DevOps and SplunkDevOps and Splunk
DevOps and Splunk
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
malliCV
malliCVmalliCV
malliCV
 
Using Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded SpaceUsing Customer Development to get Traction in a Crowded Space
Using Customer Development to get Traction in a Crowded Space
 
What's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsWhat's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar charts
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Judy Wang, Jay Piskorik and Sabu Thomas at SpringOne Platform 2019
Judy Wang, Jay Piskorik and Sabu Thomas at SpringOne Platform 2019Judy Wang, Jay Piskorik and Sabu Thomas at SpringOne Platform 2019
Judy Wang, Jay Piskorik and Sabu Thomas at SpringOne Platform 2019
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16Enhancing the Security of Data at Rest with SAP ASE 16
Enhancing the Security of Data at Rest with SAP ASE 16
 
Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0
 
Power of ONE Automation through Web Services
Power of ONE Automation through Web ServicesPower of ONE Automation through Web Services
Power of ONE Automation through Web Services
 
Using the right tools to keep control of your Office 365 deployments
Using the right tools to keep control of your Office 365 deploymentsUsing the right tools to keep control of your Office 365 deployments
Using the right tools to keep control of your Office 365 deployments
 
DAY1- DAY2Netweaver gateway
DAY1- DAY2Netweaver gatewayDAY1- DAY2Netweaver gateway
DAY1- DAY2Netweaver gateway
 
DocOps — The Analytical Window to Your Customer’s Experience with Wade Clements
DocOps — The Analytical Window to Your Customer’s Experience with Wade ClementsDocOps — The Analytical Window to Your Customer’s Experience with Wade Clements
DocOps — The Analytical Window to Your Customer’s Experience with Wade Clements
 
T3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of ExcellenceT3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of Excellence
 

Feasibility of Using Machine Learning to Access Control_revDS

  • 1.  Kanchana Ihalagedara  Rajitha Kithuldeniya  Supun weerasekara 10/10/2015 Escape 2015 1 Supervised by Mr.Sampath Deegalla
  • 2. Internet in Educational Institutes Mainly for educational purposes. What happens if users priority is not the intended purpose. Network congestions Wastage of resources Affects individual user performance negatively 10/10/2015 Escape 2015 2
  • 3. Blocking Web Sites in Proxy Server Squid ACLs - Text file of blacklists SquidGuard - External databases DansGuardian - Content filter 10/10/2015 Escape 2015 3
  • 4. World Wide Web is Growing  Manually blacklisting web sites is impossible  Related products are not updated with the growing web 10/10/2015 Escape 2015 4 672,985,183 - 2013 968,882,453 - 2014 295,897,270 From www.internetlivestats.com
  • 5. Dynamic automated method  Automated web classification is required  Machine Learning is used in automated web classification 10/10/2015 Escape 2015 5
  • 6. Over View of Our Solution Copy client request Check URL Get web content Classify web content 10/10/2015 Escape 2015 6 Update the blacklist
  • 7. Machine Learning in Web Classification  Several web classification researches can be found  Frequently used algorithms  Naïve Byes  Support vector machine  Nearest neighbor  Classification requires a data set  Set of URLs labeled as educational or non educational 10/10/2015 Escape 2015 7
  • 8. Data Collection & Preprocessing Preprocess Squid server log Preprocess DMOZ data set Create labeled URLs Get web content Create training data set 10/10/2015 Escape 2015 8
  • 9. Model Creation & Testing  Four models were created from WEKA(small data set)  Data set with two hundred records  10 – fold cross validation for testing Algorithm Accuracy(%) PRISM 74.5 C4.5 (J48 in WEKA) 83.0 Naïve bayes 95.0 Support Vector Machines 95.5 10/10/2015 Escape 2015 9
  • 10. Model Creation & Testing  Three models using Python (larger dataset)  Data set of 4000 records  Separate data set of 1000 records for Testing Algorithm Accuracy Naïve Bayes multinomial 92.9% SVC 77.5% Linear SVC 98.9% 10/10/2015 Escape 2015 10
  • 11. Feature Selection in Linear SVC 84 86 88 90 92 94 96 98 100 Accuracy/% No. of features 10/10/2015 Escape 2015 11
  • 13. Future Work  Consider more content (Meta data)  Other Languages (Sinhala)  Image processing can be added 10/10/2015 Escape 2015 13