O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Machine learning in php Using PHP-ML

1.599 visualizações

Publicada em

Attempt to use supervised machine learning to classify email attachment images as either being logos or photos of damages. When emails are being processed, any attachments which are images (png, jpg or gif) can either be logos or be valid images which have to be kept for further processing. Features about the images have to be obtained and used to train a model which the PHP-ML library can use to make predictions.

Publicada em: Software
  • Seja o primeiro a comentar

Machine learning in php Using PHP-ML

  1. 1. Machine Learning in Php Classifying email attachments as logos or photos using Php-ml By Agbagbara Omokhoa
  2. 2. Content  Introduction  Our Use Case  Naïve Solution  Feature Extraction  Visualize Solution  Training a Model
  3. 3. Introduction  Presentation is about Supervised Machine Learning Only.  Hopefully by now you would have read a lot about machine learning, there has been a lot written about it.  Two types of learning  Supervised machine learning – Learning then applying  Unsupervised machine learning – Learning without labels  Most examples for supervised machine learning are about sentiment analysis of reviews, either from twitter or for restaurant reviews.  Most online courses or Youtube also follow the same pattern.  Most examples use python or R.  Is it possible to do machine learning in php?
  4. 4. Introduction : Supervised Learning , Examples  Sentiment Analysis  Given a review, is the sentiment positive or negative , for example  The rice was awful = -ve  The drinks were reasonably priced = +ve  Claim Classification of Loss Category  Given a description, an insurance claim should be classified under which loss category  Roos Have Hit Vehicle Damaging Drivers Side = Impact  I was driving and suddenly a hail storm hit and hit the bonnet of my car. Roof doesn't look bad but bonnet has damage = Hail Damage  INS has had some dirty go through their vehicle, unsure where the fuel was put into the car, have just noticed smoke coming from the vehicle = Mechanical Damage
  5. 5. Our Use Case  We normally request that claimants and body smash repairers provide images of damages to support their claims.  Images are sent to an email address, where a php cron script, parses the emails and attach images or documents to the claim in an application.  The script sometimes attaches company logos to the claims as well.  Given images which has been parsed from an email, can these images be classified as either logos or claims photos. Logo Image
  6. 6. Naïve Solution  Get all the images which have been identified as logos.  Use php hash_file(md5, filename) to create a array, with the hash values as the keys.  For each new file that is received, check if the hash_file value exists in the array.  Works quite well, for all known logo files, but not for new logo files.
  7. 7. Feature Extraction  Machine learning requires that input be supplied to a model, to provide an output. These input must be numeric representation of the problem.  What features can be extracted from image files  File size  Bag of colours  There are other ways to represent the features of image [ search image descriptors]
  8. 8. Feature Extraction : Image Description  Bag of colours ie, count the number of occurrences of a colour in a image.  Resize and convert image to a grayscale first, since the colour has no real effect on determining the class of image. Logo Image
  9. 9. Feature Extraction : Image Description  Create an array of 256 possible gray values  Foreach pixel position, add the gray value to the bin  Results in a 256 dimension array, for best result reduce the number of dimensions to 32
  10. 10. Feature Extraction : Image Description  Reduce the 256 features array to a 32 features array
  11. 11. Feature Extraction : Image Description  Scale all feature values to between 0 and 1  Produces smoother histograms.  32 features values are then summed to create an extra feature “pixel sum”
  12. 12. Feature Extraction : Output  Machine learning requires processing a lot input data need to structure input data,  Create a csv file for each image with heading “Grey”,”Hit”, this can be used to visualise the individual results (Create a histogram)  Create a csv file for all input files which belong to the same class.  filename,pixel_0,pixel_1,pixel_2,pixel_3,pixel_4,pixel_5,pixel_6,pixel_7,pixel_8,pixel_9,pixel_10,pixel_1 1,pixel_12,pixel_13,pixel_14,pixel_15,pixel_16,pixel_17,pixel_18,pixel_19,pixel_20,pixel_21,pixel_22,pix el_23,pixel_24,pixel_25,pixel_26,pixel_27,pixel_28,pixel_29,pixel_30,pixel_31,pixel_sum,feature_class  Example of command line  echo "Create Features for Logos"  php image.creation.php training.logos 0 > "featureslogo.csv"  echo "Create Features for Photos"  php image.creation.php training.photos 1 > "featuresphotos.csv"  Possible to create a scatter plot of the csv feature file.
  13. 13. Feature Extraction : Visualize Results  Using a graphing class to produce simple histograms Logos Photos
  14. 14. Feature Extraction : Visualize Results  Visualize pixel sum feature in excel using scatter plots 0 5 10 15 20 25 0 10 20 30 40 50 60 70 80 Logos Claim Photos
  15. 15. Conclusion from Scatter Plot.  By looking at the scatter plot we can assume that  If pixel sum > 5 then photo  Else if pixel sum < 5 then logo.
  16. 16. Training A Model (1)  Use Php-ML  Requires Php >= 7.0  https://php-ml.readthedocs.io/en/latest/  Load the different feature csv files  Extract the samples and the labels
  17. 17. Training A Model (2)  Merge the dataset for the different classes and create cross validation dataset
  18. 18. Training A Model (3)  Instantiate a classifier using one of the implementations from Php-ml  Need to tweak a few of the parameters to get the best results  Calculate the accuracy of the prediction  Save the trained model to a file.  For best result you can run the training process multiple times and save the different models
  19. 19. Training A Model (4)
  20. 20. Test Models / Validation  Load features csv file which contains known classifications  Load a trained model  For each input feature, predict the class  Compare the result to the actual result
  21. 21. Finally / What’s Next  Source Code can be found at https://github.com/deltastateonline/ml-php  Can we classify images as damage photos or images of quotes so that if a quote is received, the assessor gets an alert requesting immediate attention. Damage Photos Quote Images

×