Mais conteúdo relacionado
Semelhante a Predictive modeling DBs (20)
Predictive modeling DBs
- 2. 1. Netflix Database
http://cms.uhd.edu/faculty/chenp/class/4319/project/netflixfiles.html
Netflix, Inc. - American provider of on-
demand Internet streaming media and
flat rate DVD-by-mail
Training data set:
100,480,507 ratings
480,189 users
17,770 movies
Data set entry:
<user (ID), movie (ID), date of grade (yyyy-mm-dd), grade(1-5)>
The BellKor Solution:
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
The Big Chaos Solution:
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf
The Pragmatic Theory Solution:
http://www.netflixprize.com/assets/GrandPrize2009_BPC_PragmaticTheory.pdf
2 Nilitis, LLC. © 2012
- 3. 1. Netflix Database
User-based collaborative filtering
- Look for users who share the same rating patterns
- Use the ratings from those users to calculate a prediction
Item-based collaborative filtering
- Build an item-item matrix determining relationships between
pairs of items
- Using the matrix, and the data on the current user, infer his
taste
…A note from the donor regarding Netflix data:
"Thank you for your interest in the Netflix Prize dataset. The dataset is no
longer available.“
Robust De-anonymization of Large Sparse Datasets
http://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
3 Nilitis, LLC. © 2012
- 4. 2. EEG Database Data Set
http://archive.ics.uci.edu/ml/datasets/EEG+Database
This data from a large study to examine EEG
correlates of genetic predisposition to alcoholism.
64 electrodes placed on subject's scalps which
were sampled at 256 Hz for 1 second.
There were two groups of subjects: alcoholic and
control.
Each subject was exposed to either a single
stimulus (S1) or to two stimuli (S1 and S2).
122 subjects, each subject completed 120 trials
where different stimuli were shown.
EEG / ERP data available for free public download
http://sccn.ucsd.edu/~arno/fam2data/publicly_available_EEG_data.html
4 Nilitis, LLC. © 2012
- 5. 2. EEG Database Data Set
Control Alcoholic
example plots of a control and alcoholic subject
http://www.ingber.com/ - webpage of Lester Ingber
Use Ingber’s Canonical Momentum Indicator or smth. else? Or raw data?
5 Nilitis, LLC. © 2012
- 6. 3. Berlin Database of Emotional Speech
http://database.syntheticspeech.de/
6 basic emotions: anger, joy,
sadness, fear, disgust and boredom
+ neutral speech
Ten professional native German
actors (5 female and 5 male)
simulated these emotions,
producing 10 utterances (5 short
and 5 longer sentences)
emotion was recognized by at least
80 % of the listeners
6 Nilitis, LLC. © 2012
- 7. 3. Berlin Database of Emotional Speech
Voice Emotion Recognition:
Audio Feature
Classifier Emotion
Stream Extraction
Feature Extraction: “openEAR”
http://sourceforge.net/projects/openart/?source=dlp
Take settings from openEAR “emobase” config files and articles
+ possibly to add some feature selection steps (state of the art–
sequential feature selection)
Classifier: state of the art – SVM with polynomial or RBF kernel
(libSVM included into openEAR package)
7 Nilitis, LLC. © 2012
- 8. 4. Wikipedia page-to-page link database
http://haselgrove.id.au/wikipedia.htm
Total pages: 5,716,808
Total links: 130,160,392
Google PageRank technology:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.5427
85% likelihood of choosing a random link
from the page
15% likelihood of jumping to a page
chosen at random from the entire web
8 Nilitis, LLC. © 2012
- 9. 5. Detecting Malicious URLs
http://sysnet.ucsd.edu/projects/url/
about 2.4 million URLs
3.2 million features
Estimating covariance matrix for
high-dimensional data
Linear implementation of SVM
(LIBLINEAR)
9 Nilitis, LLC. © 2012
- 10. 5. Pseudo Periodic Synthetic Time Series Data Set
http://archive.ics.uci.edu/ml/datasets/Pseudo+Periodic+Synthetic+Time+Series
+ Branch and Bond evaluation
An Indexing Scheme for Fast Similarity Search in Large Time Series Databases
http://www.cs.rutgers.edu/~pazzani/Publications/ssdb99.pdf
10 Nilitis, LLC. © 2012
- 11. Other Datasets
Individual household electric power consumption Data Set
http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption
Bank Marketing Data Set
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
Solar Flare Data Set
http://archive.ics.uci.edu/ml/datasets/Solar+Flare
Forest Fires Data Set
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
Arrhythmia Data Set
http://archive.ics.uci.edu/ml/datasets/Arrhythmia
Communities and Crime Data Set
http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized
Census Income Data Set
http://archive.ics.uci.edu/ml/datasets/Census+Income
11 Nilitis, LLC. © 2012