2024: Domino Containers - The Next Step. News from the Domino Container commu...
Randomness
1. Data randomness, variation, coincidences, populations and
estimation and the use and abuse of statistics
www.linkedin.com/in/sureshsood
http://www.slideshare.net/ssood/randomness-28944785
4. September 9/11 Coincidences
•
911 is the emergency number
•
The twin towers looked like the number 11 so perhaps all 9/11 things relate to 11
•
9 + 1 + 1 = 11, the first flight to hit the twin towers was flight 11
•
On board flight 11 was 92 people on board, 9 + 2 = 11
•
September 11 is the 254th day of the year 2 + 5 + 4 = 11 and 365 – 254 = 111)
•
11 letters each in “New York City”, “Afghanistan”, “the Pentagon”, and “George W. Bush”
•
New York was the 11th state admitted to the union
•
119 (1 + 1 + 9 = 11) used to be the area code to both Iraq and Iran
•
Flight 77 that crashed in Pennsylvania had 65 people on board, 6 + 5 = 11
•
March 11 (2004) attack in Spain. There are exactly 911 days between this and the September 11 (2001) attack.
5. The Strange Coincidence of the Girl from Petrovka
http://listverse.com/2007/11/12/top-15-amazing-coincidences/
6. Key Research Finding - Serendipity is essential
"I knew before coming to
university that I wanted to
do something different, I
wanted to take advantage
of all opportunities"
"I randomly received an
email for the Young
Australian Entrepreneurship
competition event and
decided to enter it"
"You make
your luck
happen”
"An executive from the
association addressed my
class for volunteers"
"You want
to be lucky,
not right"
"I found the idea of
my business while
doing an assignment"
(Sood and Marchand 2012)
8. Data Scientist Job Roles
(LinkedIn 16 September 2012)
Notes: Word count shown next to each word
Exclusion words: ability area bay com experience francisco job linkedin preferred san
9. Adapted from and source: paul.kennedy@uts.edu.au
Statistics as Hypothesis Driven Process
10. Leading Questions: Yes Prime Minister
(video:bit.ly/yes_stats)
The risk with data mining is the discovery of meaningless
patterns and given enough data and time you can support almost
anything
Sir Humphrey Appleby demonstrates use of leading questions to
skew an opinion survey to support or oppose National Service
(Military Conscription)
Taken from the 1st Season of Yes Prime Minister - Episode 2, The
Ministerial Broadcast.
Yes Prime Minister is a British political satire/ comedy aired in the
1980s
13. ATLAS: The observed (full line) and expected (dashed line) 95%
CL combined upper limits on the SM Higgs boson production
cross section divided by the Standard Model expectation as a
function of mH in the full mass range considered in this analysis
(a) and in the low mass range (b). The dashed curves show the
median expected limit in the absence of a signal and the green
and yellow bands indicate the corresponding 68% and 95%
intervals.
14. 5
Statistics of the Higgs Boson
Particle physics has an accepted definition for a “discovery”: a five-sigma level of certainty
The number of standard deviations, or sigmas, is a measure of how unlikely it is that an experimental result
is simply down to chance rather than a real effect
Similarly, tossing a coin and getting a number of heads in a row may just be chance, rather than a sign of a
“loaded” coin
The “three sigma” level represents about the same likelihood of tossing more than eight heads in a row
Five sigma, on the other hand, would correspond to tossing more than 20 in a row
One standard deviation from the center would give a probability of 68% of all data (~ 1 in 3)
About 95.5% of the data will be inside two standard deviations
(~ 1 in 22)
About 99.7% lie within three standard deviations (~ 1 in 370),
Four standard deviation events occur 1 in 15,787 times
Five standard deviation events occur 1 in every 1,744, 278 times.
So a five sigma effect, which two experiments now have, means that such a thing would be observed by
chance with a probability of 1/1,744, 278 = 5.7 x 10-7.
This is so unlikely that this is the criterion for accepting an effect as real in particle physics, when it is
corroborated by another experiment as in this case.
15. 24104 Emerging Marketing Issues and Social Media
Assessment item 1: Project (Group)
Objective(s): This addresses Subject Learning Objective/s 1-4 Weighting: 30%
Due: The group report is due by start of lecture in Week 14.
Length: The final deliverable report requires to be of sufficient length to document:
1. The acquisition of the social data and supporting process
2. Visualisation of the network data and key measures
3. Description of models built from social data
4. Conclusion highlighting any useful insights
16. 24104 Emerging Marketing Issues and Social Media
Task: Groups of students (4-5) participate in a practical project to data mine social media data.
Completion of this task requires the group to provide a report documenting the experience in acquiring and
discovering the social data using visualisation, setting up the data mining environment, describing the
findings with regard to the models built from the data and concluding insights. The approach to mine the
data is in 2 stages:
1. Visualise a social network of data freely available to the group e.g. LinkedIn, Twitter, YouTube,
Facebook, email,Flickr.
Identify and describe key network measures
2. Mine the data to build models from the social data
This project uses the sophisticated REVOLUTION R ENTERPRISE software as a platform for data mining.
The software is free for academic use. The Rattle (R Analytical Tool To Learn Easily) package provides a
graphical user interface specifically for data mining using R and overcomes the need to use heavy
programming.
The following resources help to bootstrap the project and amuse the group project members:
Kaggle – Kaggle.com/competitions
AnalyticsBridge A social network for analytics professionals - analyticbridge.com
The R Inferno “If you are using R and you think you‟re in hell, this is a map for you”
http://www.burns-stat.com/documents/books/the-r-inferno/
Furnas, Alexander ( 2012) Everything You Wanted to Know About Data Mining but Were Afraid to Ask, the
Atlantic, 3 April http://www.theatlantic.com/technology/archive/2012/04/everything-you-wanted-to-knowabout-data-mining-but-were-afraid-to-ask/255388/
17. How ?
Train of Thought Analysis
•
•
•
•
•
•
•
•
A bottom-up approach
Perceptual process of discovery to uncover structure
Distinguish patterns,structure, relationships and anomalies
Reveals indirect links
Knowledge is colour coded
Marketing Analyst can spot irregularities
Not sure why but where does this lead
Harnesses the power of the human mind
Data
Information
Knowledge
18. How to Find a Killer using Visualisation
•
1990’s Ivan Milat killed 7 backpackers making him Australia's most notorious Serial Killer
•
Everyone in Australia was a suspect
•
Enormous volumes of data from multiple sources
RTA Vehicle records
Gym Memberships
Gun Licensing records
Internal Police records
•
•
Police applied visualisation techniques (NetMap) to the data
•
Reduced the suspect list from 18 million to 230
•
Further analysis with the use of additional information reduced this to 32
25. Elaboration of Trip to Paris Blog Story (Means-End & Heider)
Woodside,Sood & Miller 2008 When Consumers and Brands Talk Psychology & Marketing
17. "I wanted Paige to get a feel
for shopping experiences that
she would not have at home (aka
the ubiquitous mall). "
3. Paris
+
16. "On our trip to Giverny, we met a young
woman from Brisbane, Australia who was
traveling on her own and we invited her to join
us. Three of us enjoyed delicious and
innovative soufflés, while Paige had the rack of
lamb. We shared two dessert soufflés, one
chocolate and the other cherry/almond. Yum"
+
1.Gayle
+
2. Paige
14. "They had decide to come to Paris
to find the Harley Davidson store so
they could buy Harley Paris t-shirts."
+
4.”The occasion
was my cousin
Paige’s 16th”
15." Michael Osman is an American artists
living in Paris."
"He supplements his income by being a
tour guide." I" found out about him on
Fodors"
"So I engaged Michael for two days."
5. “I am a Canadian
and get by in
French.”
6. "All I can say is WOW! We rented a 2
bedroom, 1 ½ bath apartment (two
showers), "Merlot" from ParisPerfect
http://www.parisperfect.com/ and boy was
it ever perfect! "
7. “We had a full view of the Eiffel from
our charming little terrace. ....We were
within walking distance to two metro
stops (Pont d'Alma or Ecole Militaire) "
13."The father stretched out his cupped
hands which held all of the pieces they were
able to recover, including the memory stick
and he very solemnly said, "El muerto...".
12. Unforgettable Memories
"This trip had so many memories, but here are a few choice
highlights........On our very first night, knowing that the Eiffel
Tower light show started at 10:00 p.m.... she [Paige] dropped
her camera…down 6 flights…we were stunned…Spanish
Family below standing below [with pieces of the camera]”
8. "We were walkable to many good
bistros, cafes and bakeries and only a
few blocks from the wonderful market
street Rue Cler."
18."We went on Fat
Tire's day trip to
+
Monet's gardens and
house in Giverny, about
an hour outside Paris."+
19....."I know Paige will
treasure the memory of
this girl's trip for many
years to come."
11.Sites
•The Marais
•Notre Dame
•L'Arc de Triomphe - 248 steps up and 248 steps
down...
•Champs Elysee
•Jacquemart Museum
•Louvre Lite
•Musee D'Orsay
•Les Invalides, Napoleon's Tomb and the
Napoleon Museum
•Sacre Coeur
•Monmartre
•Rodin Museum
•Pompidou Museum
•Train to Vernon, bike to Giverny with Fat Tire
Bike Tours
•http://www.fattirebiketoursparis.com/
•Eiffel Tower
9. "I bought a Paris Pratique pocket-sized book at a
Metro station. This handy guide has detailed maps
of each arrondisement, as well as the metro lines,
the bus lines, the RER and the SCNF (trains). I'll
never be without this again."
10."Six months before our trip, I gave
Paige a couple of good guide books on
Paris and suggested she let me know
what her interests were since after all,
this was to be her trip."
25
27. Linguistic Inquiry and Word Count (LIWC)
Text Analysis : The Psychological Power of Words
LWIC dimension
“I love Paris”
Paige’s Story
Personal texts
Formal texts
Self-references
(I, me, my)
6.12
11.4
4.2
Social words
10.55
9.5
8.0
Positive emotions
3.04
2.7
2.6
Negative emotions
0.54
2.6
1.6
Overall cognitive words
4.12
7.8
5.4
Articles (a, an, the)
7.74
5.0
7.2
Big words (> 6 letters)
18.40
13.1
19.6
Pennebaker, J. W., Francis ME, Booth RJ. (2001). Linguistic Inquiry and Word Count (LIWC):
LIWC2001. Mahwah: Lawrence Erlbaum Associates.
27
31. Which Pattern is Random ?
http://www.wired.com/wiredscience/2012/12/what-does-randomness-look-like/
32. Ceiling of the Waitomo cave in New Zealand.
http://www.waitomo.com/SiteCollectionImages/glowworms/Waitomo-Glowworm-Caves-New-Zealand-boat-group.jpg
33. Which Pattern is Random ?
THHHTHTTTTHTTHTTTHHTHTTHT
HTTHTTHTHHTTHTHTHTTHHTHTT
HHHTHTHHTHTTHHTTTTHTTTHTH
HTTHHHTTHTTHTHTHTHHTTHTTH
TTHHTTTTTTTTHTHHHHHTHTHTH
THTHTHTHHHTTHTHTHTHHTHTTT
THTHTHHHHHTHHTTTTTHTTHHTH
HTHHTHTHTHTHHTTHTHTHTTHHT
http://www.wired.com/wiredscience/2012/12/what-does-randomness-look-like/
35. Newcomb Discovery (1881)
• American mathematician/astronomer Simon Newcomb discovered the
first few pages of a logarithmic table corresponding to the lower
significant digits (typically those below 5) were comparatively dirtier than
the later pages corresponding to the higher significant digits (typically
those above 5)
• Newcomb attributed greater usage to users were looking-up numbers that
started with digit 1 more often than numbers starting with, say, digit 5
• This leads to probability distribution of an user accessing any of the pages
at any given time was skewed in favour of the earlier pages corresponding
to the lower significant digits!
• This was directly in contrast with the normal theory of probability
according to which the probability of randomly picking any number
between one and nine should be equal to the unique value of 1/9 or
roughly 11.11%
36. Number
Leading (first) digit
350
3
42057
4
0.64
6
If the leading (first) digit is d, then the frequency of
occurrence (probability) of the leading digit is
Log10 (1 + 1/d)
Leading
digit (d)
1
2
3
4
5
6
7
8
9
Probability
of
occurrence
30%
18%
12%
10%
8%
7%
6%
5%
< 5%
37. Benford Stumbles Over Newcomb Finding
• In 1938, almost half a century after the Newcomb Frank
Benford was going through a large collection of numerical
data from disparate sources when he stumbled upon a similar
finding
• Benford used a huge volume of data to empirically support his
finding including areas of rivers, street addresses of “American
men of Science” and numbers appearing in front-page
newspaper stories. He went on to publish his findings in a
number of papers including the 1937 “The Law of Anomalous
Numbers”. Thus the „ principle ‟ came to be known as
“Benford‟s Law”
38. Benford Utility
•
Human choices are not random, invented numbers are unlikely to follow Benford’s Law
•
Only works with natural numbers (those numbers that are not ordered in a particular
numbering scheme
•
When people invent numbers, their digit patterns (which have been artificially added to a list
of true numbers) will cause the data set to appear unnatural
–
See Durtshi, Hillison and Pacini (2004) The Effective Use of Benford’s Law to Assist in Detecting Fraud in
Accounting Data by).
•
Does not work with Lottery!
•
Formally proven in 1996
•
Corpus of over 650 papers available at
–
•
http://www.benfordonline.net/list/chronological
Benford Law Plug-in is for Kirix Strata, R package “BenfordTests” or visualise in Tableau
39. Smartphone, Google Glass or Apple Watchwill
Know What you Want before you do
“…from 2014 your phone [glasses or watch] will
anticipate your needs, do the research, tell you
what what you want to know – sometimes
before the question even occurs to you…”
Chapman, Jake (2013), The Wired World in 2014
40. Useful References Informing our Thinking
There is a potential 93% average predictability in user mobility, an exceptionally high
value rooted in the inherent regularity of human behavior. Yet it is not the 93%
predictability that we find the most surprising. Rather, it is the lack of variability in
predictability across the population.
Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for
Pervasive Systems. Proceedings of the 9th International Conference on Pervasive
Computing (Pervasive'11)
Daily and weekly routines => Few significant places every day => Regularity in human
activities => Regularity leads to predictability
41. Useful References Informing our Thinking
Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human
Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge
Workshop.
we have shown that it is possible to exploit the correlation between movement data and
social interactions in order to improve the accuracy of forecasting of the future geographic
position of a user. In particular, mobility correlation, measured by means of mutual
information, and the presence of social ties can be used to improve movement forecasting
by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of
potential existence of physical or distant social interactions and vice versa.
Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human Mobility
Where are you going to be 285 days from now at 2pm …we show that it is possible to
predict location of a wide variety of hundreds of subjects even years into the future and
with high accuracy.
42. Caution!
“Children never put off till
tomorrow what will keep
them from going to bed
tonight”
ADVERTISING AGE
42
Notas do Editor
Social graph in the following order: you, your social network friends, friends-of-friends, your followers, and the overall community.Wall Street feed – simple way to navigate social network of friends social gestures and your –efficient, increased engagement , increases importance of attention info c.f. banking – remember fuss around news feedGoogle Open Social Attention Streams (already included in Plaxo Pulse) - MySpace Friends Updates -Netvibes Activities-LinkedIn Network UpdatesHigh social engagement vs traditional media (radio, tv, print, outdoor) with low engagement. This is about dialogue, interactivity, informality, people + technology & niche NOT Tradigital for mass using push, automation & technology only. Social Media Marketing practice centres around – networks, communities, blogs and microblogging. Traditional business functions can be socialised e.g. legal, supply chain, R&D, HR…Social Strategy (Media) - through sharing; engaging; building relationships and influencingincrease our reach, influence and relevancecreate ambassadors to support and promote what we dopersonalise interactionsencourage and grow communities through a critical mass of active cultural and scientific participants maximise revenuechange our work models from one-to-one communication to many-to-many communicationmove from providing information to creating shared meaning with audiences
These are all interesting, but do they mean anything?What do you think about patterns of this type and their relations to the coincidences.
In 1974, Hopkins starred in 'The Girl from Petrovka,' based on a book by George Feifer. Not long after signing on to the film, Hopkins went to London to try to track down a copy of the book. After canvasing several bookshops, he could not find a copy. Frustrated, Hopkins entered Leicester Square train station to board a train when he spotted a copy of 'The Girl from Petrovka’. The book was discarded on a nearby bench. Naturally he took the book.Two years later filming in Vienna, the author George Feifer visited the set. During a conversation with Hopkins, Feifer mentions, he doesn’t even have a copy of his own book. He had lent his last copy with his own annotations to a friend. The book had been lost somewhere in London. Hopkins fetched his copy with notes in the margins. Feifer, the author confirmed this indeed was the same book.
Serendipity is an essential pre-requisite to any entrepreneurial activities by all student entrepreneurs (Type 1 and 2) and archetypal entrepreneurs"You make your luck happen" (#101)"I found the idea of my business while doing an assignment" (#105)"I randomly received an email for the Young Australian Entrepreneurship competition event and decided to enter it" (#107)."You want to be lucky, not right" (#105)."I knew before coming to university that I wanted to do something different, I wanted to take advantage of all opportunities" (#102)"An executive from the association addressed my class for volunteers" (#106).
Combined results of searches for the standard model Higgs boson in pp collisions atat √S = 7 TeV, The CMS Collaboration, CERN-PH-EP/2012-023 2012/02/08
Diana – max links (degree centrality) most connected – connector or hub – number of nodes connected – high influence of spreading info or virusHeather – best location powerful figure as broker to determine what flows and doesn’t –single point of failure – high betweeness = high influence – position of node as gatekeeper to exploit structural holes (gaps in network)Fernado & Garth – shortest paths = closeness – the bigger the number the less centralEigenvector = importance of node in network ~ page rank google is similar measure – being connected to well connected a popularity and power measure
The first student’s data has clusters – long runs of up to eight tails in a row. This might look surprising, but it’s actually what you’d expect from random coin tosses (I should know – I did a hundred coin tosses to get that data!) The second student’s data in suspiciously lacking in clusters. In fact, in a hundred coin tosses, they didn’t get a single run of four or more heads or tails in a row. This has about a 0.1% chance of ever happening, suggesting that the student fudged the data (and indeed I did).
The Poisson distribution. It tells you the odds that a large number of infrequent events result in a specific outcome