What is artificial neural networks, how can we use it,
Example scenarios and workshop at Istanbul Technical University Technology Transfer Office (ITUNOVA TTO). Given talk in 2016.
2. • Talking Data
• Introduction
• Datasets
• Sample submission
• Some analysis
• Basic solution
• To Do
• Predicting A Biological Response
• Introduction
• Datasets
• Solution
• Model
• Code
• To Do
TOPICS
3. ARTIFICIAL NEURAL NETWORKS PROBLEM
PROJECT
TALKINGDATA MOBILE USER DEMOGRAPHICS
A KAGGLE COMPETITION
YAKUP GÖRÜRDATE NAME
13 DECEMBER 2016
4. TALKINGDATA MOBILE USER DEMOGRAPHICS
INTRODUCTION
• TalkingData, China’s largest third-party mobile data platform,
• TalkingData is seeking to leverage behavioral data from more than
70% of the 500 million mobile devices active daily in China to help
its clients better understand and interact with their audiences
• In this competition, challenged to build a model predicting users’
demographic characteristics based on
• Their app usage
• Geolocations
• Mobile device properties
6. • The data was obtained from the kaggle.com as a .csv file.
• Test Data:
• gender_age_test.csv
• Training Datas:
• gender_age_train.csv
• events.csv
• phone_brand_device.csv
• app_events.csv
• app_labels.csv
• label_categories.csv
TALKINGDATA MOBILE USER DEMOGRAPHICS
9. Events.csv App_events.csv
When a user uses
TalkingData SDK,
the event gets
logged in the events
data.
The event
corresponds to a
list of apps in
app_events.
16. def read_train_test():
# App events
print('Read app events...')
ape = pd.read_csv("/Users/yakup/Downloads/TalkingData/app_events.csv")
ape['installed'] = ape.groupby(['event_id'])['is_installed'].transform('sum')
ape['active'] = ape.groupby(['event_id'])['is_active'].transform('sum')
ape.drop(['is_installed', 'is_active'], axis=1, inplace=True)
ape.drop_duplicates('event_id', keep='first', inplace=True)
ape.drop(['app_id'], axis=1, inplace=True)
# Events
print('Read events...')
events = pd.read_csv("/Users/yakup/Downloads/TalkingData/events.csv", dtype={'device_id': np.str})
events['counts'] = events.groupby(['device_id'])['event_id'].transform('count')
# The idea here is to count the number of installed apps using the data
# from app_events.csv above. Also to count the number of active apps.
events = pd.merge(events, ape, how='left', on='event_id', left_index=True)
# Below is the original events_small table
# events_small = events[['device_id', 'counts']].drop_duplicates('device_id', keep='first')
# And this is the new events_small table with two extra features
events_small = events[['device_id', 'counts', 'installed', 'active']].drop_duplicates('device_id', keep='first')
# Phone brand
print('Read brands...')
pbd = pd.read_csv("/Users/yakup/Downloads/TalkingData/phone_brand_device_model.csv", dtype={'device_id': np.str})
pbd.drop_duplicates('device_id', keep='first', inplace=True)
pbd = map_column(pbd, 'phone_brand')
pbd = map_column(pbd, 'device_model')
# Train
print('Read train...')
train = pd.read_csv("/Users/yakup/Downloads/TalkingData/gender_age_train.csv", dtype={'device_id': np.str})
train = map_column(train, 'group')
train = train.drop(['age'], axis=1)
train = train.drop(['gender'], axis=1)
train = pd.merge(train, pbd, how='left', on='device_id', left_index=True)
train = pd.merge(train, events_small, how='left', on='device_id', left_index=True)
train.fillna(-1, inplace=True)
# Test
print('Read test...')
test = pd.read_csv("/Users/yakup/Downloads/TalkingData/gender_age_test.csv", dtype={'device_id': np.str})
test = pd.merge(test, pbd, how='left', on='device_id', left_index=True)
test = pd.merge(test, events_small, how='left', on='device_id', left_index=True)
test.fillna(-1, inplace=True)
# Features
features = list(test.columns.values)
features.remove('device_id')
return train, test, features Thanks to @ZFTurbo
XGBOOST SUBMISSION SAMPLE
just using users’ telephone model and their application and labels
18. TO DO
• Use also latitude/longitude
• Use also Female/Male events hours
• Re-train model and re-test
19. ARTIFICIAL NEURAL NETWORKS PROBLEM
PROJECT
PREDICTING A BIOLOGICAL RESPONSE
A KAGGLE COMPETITION
YAKUP GÖRÜRDATE NAME
13 DECEMBER 2016
20. PREDICTING A BIOLOGICAL RESPONSE
INTRODUCTION
• The development of a new drug largely depends on trial and
error.
• It typically involves synthesizing thousands of compounds that
finally becomes a drug.
• As a result, this process is extremely expensive and slow.
• Therefore, the ability to accurately predict the biological activity
of molecules, and understand the rationale behind those
predictions are of great value.
21. PREDICTING A BIOLOGICAL RESPONSE
COMPETITION AND DATA
• The objective of the competition is to help us build as good a
model as possible so that we can, as optimally as this data allows,
relate molecular information, to an actual biological response.
• Purpose: Predict a biological response of molecules from their
chemical properties
• The competition was from the Kaggle.com’s competition:
“Predicting a Biological Response” held between March16, 2012
and June 15, 2012 and re-enabled with new data 2013.
23. • The data was obtained from the kaggle.com as a .csv file.
• train.csv
• test.csv
• svm_bencmark.csv
PREDICTING A BIOLOGICAL RESPONSE
24. • The first column contains
experimental data describing
an actual biological response
(Active/Inactive).
• The remaining columns
represent molecular
descriptors (D1 through
D1776) e. g. Size, shape, etc.
PREDICTING A BIOLOGICAL RESPONSE
TRAIN DATA
29. output
Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 OutputInput
PREDICTING A BIOLOGICAL RESPONSE
MODEL
30. # convert output of sigmoid function to its derivative
def sigmoid_output_to_derivative(output):
return output * (1 - output)
X = traindatai.values
y = traindatao
f = open('Error.txt', 'w')
for alpha in alphas:
print "nTraining With Alpha:" + str(alpha)
np.random.seed(1)
#randomly initialize our weights with mean 0
synapse_0 = 2 * np.random.random((1776, 11)) - 1
synapse_1 = 2 * np.random.random((11, 4)) - 1
synapse_2 = 2 * np.random.random((4, 2)) - 1
synapse_3 = 2 * np.random.random((2, 1)) - 1
for j in xrange(20000):
# Feed forward through layers 0,1,2,3,4
layer_0=X
layer_1=sigmoid(np.dot(layer_0,synapse_0))
layer_2=sigmoid(np.dot(layer_1,synapse_1))
layer_3=sigmoid(np.dot(layer_2,synapse_2))
layer_4=sigmoid(np.dot(layer_3,synapse_3))
#how much did we miss the target value?
layer_4_error= layer_4 - y
if(j%1000) == 999:
print "Error After:" + str(j)+ " iterations:" + str(np.mean(np.abs(layer_4_error)))
# in what direction is the target value?
# were we really sure? if so, don't change too much
layer_4_delta = layer_4_error * sigmoid_output_to_derivative(layer_4)
#layer_2_delta=layer_2_error*sigmoid_output_to_derivative(layer_2)
#how much did each l3 value contribute to the l4 error (according to the weights)?
layer_3_error = layer_4_delta.dot(synapse_3.T)
#layer_1_error=layer_2_delta.dot(synapse_1.T)
#in what direction is the target l3?
# were we really sure? if so, don't change too much
#layer_1_delta=layer_1_error*sigmoid_output_to_derivative(layer_1)
layer_3_delta = layer_3_error * sigmoid_output_to_derivative(layer_3)
# how much did each l2 value contribute to the l3 error (according to the weights)?
layer_2_error = layer_3_delta.dot(synapse_2.T)
# in what direction is the target l2?
# were we really sure? if so, don't change too much
layer_2_delta = layer_2_error * sigmoid_output_to_derivative(layer_2)
# how much did each l1 value contribute to the l2 error (according to the weights)?
layer_1_error = layer_2_delta.dot(synapse_1.T)
# in what direction is the target l1?
# were we really sure? if so, don't change too much
layer_1_delta = layer_1_error * sigmoid_output_to_derivative(layer_1)
synapse_3 -= alpha * (layer_3.T.dot(layer_4_delta))
synapse_2 -= alpha * (layer_2.T.dot(layer_3_delta))
synapse_1 -= alpha * (layer_1.T.dot(layer_2_delta))
synapse_0 -= alpha * (layer_0.T.dot(layer_1_delta))
test_data = pd.read_csv('/Users/yakup/Downloads/Predicting a Biological Response/test.csv') # Open file
x = test_data.values
layer_0 = x
https://github.com/ykpgrr/Artificial_Neural_Network
31. TO DO
• Use gradient adaptive methods
• Clean dataset
• Try different ANN model