1. Amir Sedighi discussed using machine learning and big data techniques for industrial project optimization at a summer ACM course in 2016.
2. During the course, students explored examples of upgrading systems using machine learning and installed Tensorflow for an introductory project.
3. Common characteristics of the projects included small codebases, development in Java, use of Maven for project management, and use of machine learning tools.
28. 28
ﻣﺎﻫﻮﺕت ﺁآﭘﺎﭼﯽ ﮐﻤﮏ ﺑﻪ ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
One of Components on the Shelf:
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sgd/OnlineLogisticRegression.html
ﻣﺮﻭوﺭر
29. 29
ﻣﺎﻫﻮﺕت ﺁآﭘﺎﭼﯽ ﮐﻤﮏ ﺑﻪ ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
LOGISTIC REGRESSION USING APACHE MAHOUT
Logistic regression is a supervised learning algorithm used to classify
input data into a categories. If we have two possible categories, then we
are using binary or binomial logistic regression and if we have more than
three categories we are using multinomial logistic regression. For the
binary logistic regression, the algorithm will find a mathematical function
which best fits the training data. This function is the sigmoid function
which takes values between 1 and 0. The classification algorithm will use
the trained model function and will return the probability for a new input
data to be in a category or another.
ﻣﺮﻭوﺭر
30. 30
ﺗﻘﻠﺐ ﮐﺸﻒ - ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
DETECT CAR MILEAGE FRAUD USING LOGISTIC REGRESSION
small 10 100000 0
small 10 200000 0
small 8 30000 1
small 3 10000 1
small 5 10000 1
medium 6 60000 0
medium 4 10000 1
medium 4 200000 0
medium 5 50000 1
family 2 60000 0
Model Age Milage Result Model Age Milage Result
family 5 10000 1
family 4 200000 0
family 7 70000 1
family 1 20000 0
family 2 10000 1
sport 6 50000 1
sport 4 100000 0
sport 2 20000 1
sport 3 30000 1
sport 10 5000 1
sport 10 100000 1 ﻣﺮﻭوﺭر
37. 37
ﺗﻤﯿﺰ ﻭو ﺗﺮ ﮐﺪ - ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
Edit the ClusteringDemo class file and add the following code:
ﻣﺮﻭوﺭر
38. 38
-ﺍاﺟﺮﺍا ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
Run the class by using the following command:
mvn compile
mvn exec:java -Dexec.mainClass="com.technobium.LogisticRegression"
ﻣﺮﻭوﺭر
39. 39
ﻧﺘﺎﯾﺞ - ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
RESULT
Pass: 0, Learning rate: 0.1759, Accuracy: 0.9615
Pass: 10, Learning rate: 0.0511, Accuracy: 0.9712
Pass: 20, Learning rate: 0.0303, Accuracy: 0.9712
------------- Testing -------------
Probability of not fraud (0) = 0.090
Probability of fraud (1) = 0.910
a family car which is 10 years old and was used for 100000 kilometers.
For this input, the algorithm tells us that there is 91% chances that the
mileage of the car was manipulated. The decision was based on the data
given as input during the training phase.
ﻣﺮﻭوﺭر
40. 40
ﮐﺮﺩد؟ ﮐﺎﺭر -ﭼﮕﻮﻧﻪ ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
To automate the decision we will use the OnlineLogisticRegression
algorithm from Apache Mahout. The input of the algorithm will be an
array of Observation objects. Each Observation contains a vector with
the car detail (type, age mileage) and the actual category according to
the input data (1 manipulate or 0 not manipulated). The first element of
the vector is the intercept term, which is important in order to obtain a
accurate model and which has the value 1. You can see the intercept
term in action also in simple linear regression. The model is trained 30
times and each 10th iteration we check its quality against the same
input data set. If we had much more data available, we would have
used a subset of the data for model quality check. The final step will be
to use the model in order to predict the fraud probability for car data
not present in the training data set.
How It Works?
ﻣﺮﻭوﺭر
41. 41
ﺧﻼﺻﻪ - ﻧﻈﺎﺭرﺕت ﺑﺎ ﯾﺎﺩدﮔﯿﺮﯼیﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
Typical usages for logistic regression are fraud detection,
manufacturing error detection, weather prediction, mail filtering (spam
or ham) or in medicine for case classification. Very close to linear
regression this classification algorithm is one of the most used machine
learning algorithms.
CONCLUSION
ﻣﺮﻭوﺭر
130. 130
ﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
…ﻧﺼﺐ ﺍاﺩدﺍاﻣﻪ
Amirs-Mac-mini:tf-tutorial amirmini$ sudo easy_install —upgrade six
Searching for six
Reading https://pypi.python.org/simple/six/
Best match: six 1.10.0
Downloading https://pypi.python.org/packages/b3/
b2/238e2590826bfdd113244a40d9d3eb26918bd798fc187e2360a8367068db/
six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55
Processing six-1.10.0.tar.gz
Writing /tmp/easy_install-aPWTOF/six-1.10.0/setup.cfg
Running six-1.10.0/setup.py -q bdist_egg —dist-dir /tmp/easy_install-aPWTOF/six-1.10.
egg-dist-tmp-kIrpiG
no previously-included directories found matching 'documentation/_build'
six: module references __path__
Adding six 1.10.0 to easy-install.pth file
Installed /Library/Python/2.7/site-packages/six-1.10.0-py2.7.egg
Processing dependencies for six
Finished processing dependencies for six
131. 131
ﺍاﻥنﺮﺗﻬ ﺩدﺍاﻧﺸﮕﺎﻩه ACM - ۱۳۹۵ ﺗﺎﺑﺴﺘﺎﻥن - ﻣﺎﺷﯿﻦﯼیﯾﺎﺩدﮔﯿﺮ ﻭو ﺑﺰﺭرﮒگ ﻫﺎﯼیﻩهﺩدﺍاﺩد ﻫﺎﯼیﺩدﮐﺎﺭرﺑﺮ ﺑﺮ ﮔﺬﺭرﯼی
…ﻧﺼﺐ ﺍاﺩدﺍاﻣﻪ
Amirs-Mac-mini:tf-tutorial amirmini$ sudo pip install —upgrade virtualenv
The directory '/Users/amirmini/Library/Caches/pip/http' or its parent
directory is not owned by the current user and the cache has been disabled.
Please check the permissions and owner of that directory. If executing pip
with sudo, you may want sudo's -H flag.
The directory '/Users/amirmini/Library/Caches/pip' or its parent directory
is not owned by the current user and caching wheels has been disabled.
check the permissions and owner of that directory. If executing pip with
sudo, you may want sudo's -H flag.
Collecting virtualenv
Downloading virtualenv-15.0.3-py2.py3-none-any.whl (3.5MB)
100% |████████████████████████████████| 3.5MB 185kB/s
Installing collected packages: virtualenv
Successfully installed virtualenv-15.0.3