11. 100 YEARS OF INDIA’S WEATHER
1901
1911
1921
1931
1941
1951
1961
1971
1981
1991
2001
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
12. Most discussions of decision-making
assume that only senior executives
make decisions or that only senior
executives’ decisions matter. This is a
dangerous mistake…
Peter F Drucker
Data generation and analysis are not sufficient.
Consuming it as a team and acting in cohesion is.
13. THERE ARE MANY WAYS TO AID DATA CONSUMPTION
SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
Low effort High effort
High effort
Low effort
Creator
Consumer
14. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
16. EDUCATION
PREDICTING MARKS
What determines a child’s marks?
Do girls score better than boys?
Does the choice of subject matter?
Does the medium of instruction matter?
Does community or religion matter?
Does their birthday matter?
Does the first letter of their name matter?
22. DETECTING FRAUD
“ We know meter readings are
incorrect, for various reasons.
We don’t, however, have the
concrete proof we need to start the
process of meter reading
automation.
Part of our problem is the volume
of data that needs to be analysed.
The other is the inexperience in
tools or analyses to identify such
patterns.
ENERGY UTILITY
23. This plot shows the frequency of all meter readings from
Apr-2010 to Mar-2011. An unusually large number of
readings are aligned with the tariff slab boundaries.
Why would
these happen?
This clearly shows
collusion of some form
with the customers.
Apr-10 May-10Jun-10Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
217 219 200 200 200 200 200 200 200 350 200 200
250 200 200 200 201 200 200 200 250 200 200 150
250 150 150 200 200 200 200 200 200 200 200 150
150 200 200 200 200 200 200 200 200 200 200 50
200 200 200 150 180 150 50 100 50 70 100 100
100 100 100 100 100 100 100 100 100 100 110 100
100 150 123 123 50 100 50 100 100 100 100 100
0 111 100 100 100 100 100 100 100 100 50 50
0 100 27 100 50 100 100 100 100 100 70 100
1 1 1 100 99 50 100 100 100 100 100 100
This happens with specific
customers, not randomly.
Here are such customers’
meter readings.
If we define the “extent of
fraud” as the percentage
excess of the 100 unit
meter reading,
the value varies
considerably
across sections,
and time
Section Apr-10 May-10Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%
Section 2 66% 92% New 66% section
87% 70% 64% … and is
63% 50% 58% 38% 41% 54%
Section 3 90% 46% manager 47% arrives
43% 28% transferred 31% 50% out
32% 19% 38% 8% 34%
Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%
Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%
Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%
Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%
Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%
Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
… with some
explainable
anomalies.
26. PARLIAMENT DECISIONS
UPA's best cabinet performance was last
Friday, with a record 23 decisions taken in a
single day, including some long pending key
reform measures.
The only other such times were
Feb 23, 2008 (28 decisions) &
Dec 26, 2008 (23 decisions).
Nearly two-thirds of decisions
are taken on Thursday sessions,
which is also visible on the
calendar alongside.
* CCEA: Cabinet Committee on Economic Affairs
** CCI: Cabinet Committee on Infrastructure
Mon 63 5%
Tue 56 4%
Wed 105 8%
Thu 854 65%
Fri 223 17%
Sat 6 0%
27. RESTAURANT FOUND AN UNUSUAL DIP IN SALES
A restaurant chain had data for every
single transaction made over a few
years. Plotting this as a time series
showed them nothing unusual.
However, the same data on a calendar
map reveals a very different story.
Specifically, at the bottom left point-of-sale terminal, sales dips on
every Wednesday. At the bottom right point-of-sale terminal, sales
rises on every Wednesday (almost as if to compensate for the loss.)
It turns out that the manager closes the bottom-left counter every
Wednesday afternoon due to shortage of staff, assuming that it results
in no loss of sales. There is, however, a net loss every Wednesday.
28. BANK FOUND ALL LOANS BEFORE 20TH POOR
Every loan disbursed after the 20th of the month, i.e. from the 21st to
the end of the month, shows consistently lower non-performing assets
(i.e. better quality) than any loan disbursed prior to the 20th.
The bank mapped this back to their incentive scheme. The sales team’s
commission is based only on loans disbursed until the 20th. Hence new
loans are squeezed into this period without regard for their quality.
The personal finance division of a
bank, focusing on retail loans, drove
its sales through a branch sales team.
A study of the non-performing assets
of loans generated over the course of
one year shows a strange pattern.
This representation, known as a
calendar map, can show some
interesting patterns, particularly
weekday-based patterns, as the next
example will show.
Analytics can detect something that you’re specifically looking for.
It takes a visual to detect what we don’t know to look for
29. -50% returns +50%
Profits Made: Over the last 6
years, you would have beaten a 10%
Inflation about 82% of the time and lost out
about 18% of the time. So, mostly, you would
have made money on Cipla with an average
return of 14.9%.
Highest Returns: An average return of 14.1%
has been observed when held for a period of one year.
with a maximum of 79.6% if sold in Dec 2009, after being
held for a year. And a maximum of 486.9% if sold at the end
of Nov 2007 after holding for a month. The highest stock price
was Rs 414 in Nov/Dec 2012.
WHEN TO
INVEST
This visual shows the returns
from buying Cipla’s stock on
any given month, and selling
it in another.
The colour of each cell is the
return (red is low, green is
high) if you had invested in
the stock in a given month
and sold it on another. For
example this mild red is the
slightly negative return if you
had bought Cipla stock in
Mar 2011 (the row) and sold
it in Jun 2011 (the column).
30. The Shawshank
Redepmption
The Godfather
The Dark Knight
Titanic
The Phantom
Menace
Twilight
New Moon
Wild Wild West
Transformers
The Good, The
Bad, The Ugly
12 Angry
Men
7 Samurai
Rang De
Basanti
Taare Zameen
Par
Yojinbo
MORE VOTES
BETTER RATED
Many unwatched movies
Few unwatched movies
Mix of watched & unwatched
Few watched movies
Many watched movies
Movies on the IMDb
3 Idiots
https://gramener.com/imdb/
32. < 50
< 75
< 95
< 100
= 100
MLA attendance at the Assembly
Karnataka, 2008-2012
33. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
… to inform and to entertain
34. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
35. PERFORMANCE: GIRLS VS BOYS
Subject Girs higher by Girls Boys
Physics 0 119 119
Chemistry 1 123 122
English 4 130 126
Computers 6 137 131
Biology 6 129 123
Mathematics 11 123 112
Language 11 152 141
Accounting 12 138 126
Commerce 13 127 114
Economics 16 142 126
37. Based on the results of the 20 lakh
students taking the Class XII exams
at Tamil Nadu over the last 3 years,
it appears that the month you were
born in can make a difference of as
much as 120 marks out of 1,200.
… and peaks for
Sep-borns
120 marks out of
1200 explainable
by month of birth
June borns
The marks shoot
up for Aug borns
score the lowest
An identical pattern was observed in 2009 and 2010…
… and across districts, gender, subjects, and class X & XII.
“It’s simply that in Canada the eligibility
cutoff for age-class hockey is January 1. A
boy who turns ten on January 2, then,
could be playing alongside someone who
doesn’t turn ten until the end of the year—
and at that age, in preadolescence, a
twelve-month gap in age represents an
enormous difference in physical maturity.”
-- Malcolm Gladwell, Outliers
39. 1%
2%
4%
6%
9%
11%
14%
11%
16%
18%
22% 22%
33%
40%
30%
20%
10%
0%
25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90
0
500
1000
1500
2000
2500
Win %
The number of winning candidates as a % of
candidates in the age group
Candidates
The number of candidates in each
age group
Lok Sabha (2004 onwards)
40. 2%
4%
6%
9%
12%
15%
17%
15%
16%
18% 18%
20%
27%
30%
20%
10%
0%
25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-90
0
2000
4000
6000
8000
10000
12000
14000
Win %
The number of winning candidates as a % of
candidates in the age group
Candidates
The number of candidates in each
age group
Assembly elections (2004 onwards)
41. More contestants did not reduce the winner margin
Karnataka, Assembly Elections 2008
60%
50%
40%
30%
20%
10%
0%
0 2 4 6 8 10 12 14 16 18
# contestants
Winner margin
43. VISUALISING THE MAHABHARATA
How does Mahabharata, one of the largest epics
with 1.8 million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract
analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between
characters?
How can closeness of characters be analysed &
visualized?
44. What topics did parties focus on during questions?
Karnataka, 2008-2012
Hous
ing
Adult
Educat
ion
P.W.D.
Adminisr
ative
Reforms
Minor
Irrigati
on
Small
Indust
ries Social
Welfar
Agric
ultura
l
Mark
eting
Agricul
Animal ture
Husban
dry
Coope
rative
Excis
e
Fina
nce
Fishe
ries
Fishe
ries
&
Inlan
d
wate
r
trans
port
Food &
Civil
Supplies
Fore
st
Fuel
Haz &
Wakf
Health
and
family
welfare
Higher
Educati
on
Hom
e Horticu
lture
Info
rma
tion
&
Tec
hno
logy
Kannad
a &
Culture
Labo
ur
Law
&
Hu
man
Righ
ts
Major &
Medium
Industri
es
Medical
Educatio
n
Medium
and
Large
Industrie
s
Mines
&
Geolo
gy
Muz
rai
Parlia
mentar
y
Affairs
and
Human
Rights
Plan
ning
Planni
ng
and
Statist
ics
Primary
and
Secondary
Education
Primary
Educati
on
Pris
on
Pub
lic
Libr
ary
Reve
nue
Rural
Developme
nt and
Panchayat
Raj
Rural
Wate
r
Suppl
y
Rural
Water
Supply
and
Sanitat
ion
Seri
cult
ure
Smal
l
Scale
Indu
strie
s
e
Suga
r
Textil
e
Touri
sm
Tran
sport
Transp
ortatio
n
Urban
Develo
pment
Water
Resourc
es
Woman &
Child
Developm
ent
Youth
and
Sports
Yout
h
Servi
ce &
Spor
ts
BJP focus
JD(S)
focus
INC focus
45. What topics did the young & old focus on during questions?
Karnataka, 2008-2012
Young Old
Social
Welfar
P.W.D.
e
Health and
family
welfare
Reven
ue
Rural
Developme
nt and
Panchayat
Raj
Animal
Husba
ndry
Rural
Water
Supply
and
Sanitati
Planni
ng and
Statisti
cs
Suga
r
Urban
Develo
pment
Water
Resour
ces
Minor
Irrigati
on
Fuel
Parliam
entary
Affairs
and
Human
Rights
Hous
ing
Agric
ulture
Primary
Educati
on
Primary and
Secondary
Education
Woman &
Child
Priso
n
Developme
nt
Higher
Educati
on
Hom
Coope e
rative
Fore
st
Adminisra
tive
Reforms
Labo
ur
Food &
Civil
Supplies
Tour
ism
Fina
nce
Transpo
rtation
Hortic
ulture
Muzr
ai
Haz &
Wakf
Trans
Medical port
Educatio
n
Medium
and Large
Industries
Excis
e
Major &
Medium
Industrie
s
Kannad
a &
Culture
Text
ile
Fishe
ries
Adult
Educati
on
on
Mines
&
Geolog
y
Small
Industr
ies
Youth
and
Sports
Agricul
tural
Marke
ting
Rural
Water
Supply
Fisher
ies &
Inland
water
trans
port
Small
Scale
Indus
tries
Yout
h
Servi
ce &
Sport
s
Seric
ultur
e
Law
&
Hum
an
Righ
ts
Plan
ning
Info
rma
tion
&
Tec
hnol
ogy
Publ
ic
Libr
ary
46. PRE-2009 2009 AND AFTER
promotion scheme
revised
project
approved
development
agreement amendment
establishment
central
act
Decisions to increase the number of lanes
on highways grew significantly post-2009,
especially as part of the CCI (Cabinet
Committee on Infrastructure) decisions
section
limited
bill
laning
plan
government
new
ltd
approval phase
sector
state
setting
investment
pradesh
policy
four
year
programme
amendments
fund
indian
extension
institute
commission
nhdp
technology
proposal
iii
implementation
equity
assistance
cooperation
transfer
infrastructure
additional
corporation
international
mou
cabinet
company
public
construction
services
continuation
approves
education states
financial
revision
sponsored
port
mission
centrally
basis
signing
protection
management
capital
bank
two
projects
research
upgradation
rural
special
land
delhi
employees
existing
committee
relief
convention
six
crore
payment
power
health
cost
package
institutions
acquisition
control
restructuring
air
grant
field
university
scheduled
Decisions related to intervention,
assistance and relief were almost
entirely concentrated in pre-2009
The number of international
agreements has declined
dramatically between pre-2009
and post-2009
A significant rise in the number of
decisions related to the States is seen
post 2009 – in contrast with the focus
on “Central” pre-2009
PARLIAMENT DECISIONS
47. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
… to connect the dots for your readers
48. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
55. Bangalore
Singapore
1 follower
100 followers
A follows B (or)
B follows A
Most followed in
Bangalore
Sudar, Yahoo!
Anand C, Consultant
Kiran, Hasgeek
Anand S, Gramener
Most followed in
Singapore
Mugunth, Steinlogic
Honcheng, buUuk
Sau Sheong, HP Labs
Lim Chee Aung
SOCIAL MEDIA IN AUTOMATED RECRUITING
56. DIRECTORSHIPS AT THE TATAS
Every person who was a Director at the Tata
Group is shown here as an orange circle. The size of
the circle is based on the number of directorship
positions held over their lifetime.
Every company in the Tata Group is
shown here as a blue circle. The size of the
circle is based on the number of directors the
company has had over time.
Every directorship relation is shown
by a line. If a person has held a
directorship position at a company, the two
are connected by a line.
The group appears to be divided into
two clusters based on the network of
directorship roles.
Prominent leaders
bridge the groups
Tata Teleservices
Tata Consultancy Services
Similar network patterns have helped our clients:
• locate terrorists (who called each other but no one outside their network)
• de-duplicate customers (who share the same address and date of birth)
• analyse competitor strengths (based on the cluster of keywords in their patents)
Tata Business Support Services
Tata Global Beverages
Tata Infotech (merged)
Tata Toyo Radiator
Honeywell Automation India
Tata Communications
A G C Networks
Tata Technologies
Some directors are
mainly associated with
the first group of
companies
Tata Projects
Tata Power
Tata Finance
Idea Cellular
Tata Motors
Tata Sons
Tata Steel
Tayo Rolls
Tata Securities
Tata Coffee
Tata Investment Corp
A J Engineer
H H Malgham
H K Sethna
Keshub Mahindra
Ravi Kant
Russi Mody
Sujit Gupta
A S Bam
Amal Ganguli
D B Engineer
D N Ghosh
M N Bhagwat
N N Kampani
U M Rao
B Muthuraman
Ishaat Hussain
J J Irani
N A Palkhivala
N A Soonawala
R Gopalakrishnan
Ratan Tata
S Ramadorai
S Ramakrishnan
Second group of companies
First group of companies
Some directors are
mainly associated with
the second group of
companies
57. SHOW
me what is happening
with the data
Allow me to
EXPLORE
and figure it out
EXPLAIN
to me why it’s
happening
Just
EXPOSE
the data to me
… to allow your users to tell stories
58. VISUALISATION IS IMPERATIVE FOR
DATA → INSIGHTS → ACTION
Spot the unusual Communicate patterns Simplify decisions
59. A data analytics and visualisation company
gramener.com
for more examples
We handle terabyte-size data via non-traditional analytics and visualise it in real-time.
Notas do Editor
Let’s take a small test. We’ll show a table of numbers on the screen, and ask 3 questions about those numbers. You have 30 seconds to answer these. You can just write down the answers or remember them – there’s no need to say the answer out aloud.
Your timer starts now.
What answers did you get?
How many numbers were above 100?
How many were below 10?
Which quadrant had the highest total?
[Typically, there will be a lot of variance in these answers]
So there’s considerable variation in the answers you get.
Now, let’s do the same exercise again, but with some extremely simple highlighting. It’s the same questions. You have 30 seconds. This time, you can say the answer out aloud if you like.
Your time starts now.
Cue 1: “Of late, enabling these interactions involves a lot of big data… and consuming this data is hard…”
Cue 2: A few seconds after George Bush
Gramener is a data analtyics and visualisation company. We have the ability to process data at a small and a large scale. We analyse the data to find non-intuitive insights that lie hidden behind it and present it as a visual story that makes those insights obvious in real time.