Talk at Bournemouth University 16th September.
The main part of this talk is on the post-hoc analysis of REF data for computing, the apparent bias by sub-area, institution and gender, and the implications of this for policy in UK computing.
In addition I briefly review a number of other areas of my research where data is central.
http://alandix.com/ref2014/2015/09/16/ref-talk-at-bournemouth/
3. today I am not talking about …
• intelligent internet interfaces
• visualisation and sampling
• situated displays, eCampus,
small device – large display interactions
• fun and games, virtual crackers,
artistic performance, slow time
• creativity and Bad Ideas
• modelling dreams and regret
and the emergence of self
…
4. … or even lots of lights
http:/www.hcibook.com/alan/projects/firefly/
5. I am talking about ...
REF data analysis
long tail of small data
7. REF 2014
Research Excellence Framework
approx 5 yearly research assessment in the UK
not just about the UK …
lots of countries thinking to do similar
... and looking to REF as example
9. REF panels
4 main panels, 36 sub-panels, ~200K outputs
sub-panel 11: computer science and informatics
I was on this panel
but NO confidential data here
everything public domain
10. REF profiles
every output graded: 4* / 3* / 2* / 1*
individual grades confidential and destroyed
each ‘Unit of Assessment’ (dept) given a profile
http://results.ref.ac.uk/Results/ByUoa/11/Outputs
11. sub-area profiles
N.B. computing only
each output given ACM code
originally to enable allocation to panelists
… but, also used to create sub-area profiles …
12. sub-area profiles
From Morris Sloman’s slides & panel report
theoretical areas
30-40% 4*
applied/human areas
10-20% 4*
13. data not information
sub-panel report warning:
"These data should be treated with circumspection …
however already affecting institutional policy
hiring, internal investment
… and may influence research council policy
14. possible reasons for variation …
1. best applied work is weak
– including HCI :-/
2. long tail
– weak researchers choose applied areas
3. latent bias
– despite panel’s efforts to be fair
can bibliometrics disentangle these?
15. metrics and assessment
citation metrics known to be good
post-hoc correlates of sophisticated measures
… but not for individuals and small cohorts
and danger of gaming and policy distortion
suitable for verifying large-scale patterns
(and HEFCE using them for this)
16. data used for analysis
all in public domain
(virtually) complete list of outputs:
– excluding a few confidential ones
– for each: name, doi, ACM topic area, Scopus citations
Google scholar citations for each
– gathered after REF (not used in assessment)
UoA and sub-area profiles
17. metrics used
Scopus (late 2013 census )
– with/without 2012/13 as few citations
‘Normalised Scopus’
– using ‘contextual data’, corrects for
different citation patterns between areas
– places output in top 1%, 5%, 10% of its area worldwide
Google Scholar (late 2014 census)
– with/without 2012/13; zero treated as zero/missing
seven variants – all give similar results
18. results … massive differences
% citations in
top quartile
% REF 4* ratio
winners
losers
23. for example,
HCI research (web similar) …
on average …
• HCI/CSCW paper needs to be in top 0.5%
worldwide to get 4*
• logic/algorithms paper just needs to be in top 5%
10 fold difference
24. and just as you thought it was all over …
… institutional effects
look at +/- 25% REF compared with citations
N.B. use high-end weighted measure as money is
focused (4:1:0:0)
of 35 losers, 25 are post-1992 universities
of 17 winners, 16 are pre-1992 universities
25. an example …
XXXXXXX – a new university
YYYYYYYY – an old university
World Rankings
REF
26. and Gender?
Female authors in main panel B were significantly less likely
to achieve a 4* output than male authors with the same
metrics ratings. When considered in the UOA models,
women were significantly less likely to have 4*
outputs than men whilst controlling for metric
scores in the following UOAs: Psychology, Psychiatry
and Neuroscience; Computer Science and
Informatics; Architecture, Built Environment and Planning;
Economics and Econometrics.
The Metric Tide (HEFCE, 2015)
27. implicit bias?
HEFCE analysis:
male staff in computing is 1/3 more likely to get
a 4* than female
areas and types institutions disadvantaged by REF
often those with more women
… implications for future recruitment?
28. future for research assessment?
• pure metrics?
• metrics as part (e.g. older outputs)
• metrics as under-girding (burden of proof)
• human process – metrics for in-process feedback
31. Big Data
everyone is talking about it
Twitter, Google, Facebook, NSA,
universities, … and funding
Big Data does it with MapReduce
Semantic Data does it with RDF
32. the long tail
size of
data set
a few very large data sets
e.g. Twitter, streams,
Open Govt., OS,
geonames, dbpedia the small data of ordinary life:
from local bus timetables
to squash club league tables
33. stories of small data …
Walking Wales
Learning analytics
Open Data Islands and Communities
Musicology
34.
35. Alan Walks Wales
1058 miles (1700km)
3 million footfalls
3 ½ months
April-July 2013
focus on IT at the margins
one thousand miles of poetry, technology and community
36. vision
personal
encircling, encompassing, pilgrimage, homecoming,
practical
IT for the walker & IT for local communities
philosophical
reflections on walking and space, locality and identity
research
personal agenda and living lab
lots of
data
37. data
location
GPX ... batteries ... sporadic signals ....
bio-sensing
ECG (heart), EDA (skin) and accelerometers
audio and images
in the moment
text
after the event
implicit
explicit
The largest ECG trace
in the public domain
38. challenges (1)
location
GPX – merging and mending
bio-sensing
ECG & EDA – special formats & volume
audio and images
volume, transcription and annotation
text
semantic markup, synchronising sources
39. challenges (2)
documentation
methodology of creation, data formats
for other people to use!
meta-data
for machines to use
PR
telling the world about it!
academic culture
we do not value data!
40. an offer
multiple synchronisable data streams
largest public domain ECG trace
post-hoc analysis
simulate real use
please use it!
43. time frames for learning analytics
days and hours
email, during lectures and labs, stduent meetings, gaps
week
preparing for teaching, exercises
months/mid-semester
reporting points, staff meetings, cohort/student progress
end of semester/term/year
exams, exam boards, course revew,
start of semester/term/year
preparing for new courses or re-runs, rollover!
years
new courses, professional development, appraisal, promotion
48. island data flows
from community to world
Community
groups and individuals
rest of
the world
1
• visibility and
control
• identity and
empowerment
• level of detail
• local knowledge
49. island data flows
from world to community
Community
groups and individuals
rest of
the world
2 • making the
most
of open data
• local decision
making
• lobbying and
negotiation
50. island data flows
within the community
Community
groups and individuals
3
• gossip is not enough!
• sparse, dispersed population
• social cohesion and economic benefits
51. island data flows
between communities
Community
groups and individuals
other
communities
4
• sharing best practice
• brand presence
• interlinked data
52. benefits to …
the community
empowerment and control
availability of information
communication within and between communities
the world
improved quality of data
level of detail of data
local knowledge and understanding
53.
54. In Concert
Concert ephemera
1750–1800 Calendar of London Concerts
1815–1895 Concert Life in London
1894–1944 Concert Programme Exchange (BL)
External sources
MusicBrainz
MBz id as connect into Linked Data, BBC, etc.
Authoritative sources (future)
e.g. British Library BNB, Concert Programmes metadata
55.
56.
57. concert database
classic digital humanities?
original
sources
selected
sources
systematic
sample
transcription
& extraction
(medium expertise)
interpretation
(high expertise)
digitised
sources
authoritative
data
analysis & use
(high expertise)
academic
publication
large digital
archive
(e.g. BBC)
possibly
create
linkage
61. big bang to incremental
digitised
sources
authoritative
data
academic
publication
...
62. big bang to incremental
problem focused augmentation
transform cost-benefit
digitial
archive
academic
publications
...
partial
enhancement
& interpretation
64. => reflection and requirements
digital symbiosis
suggestion and confirmation
provenance and authority
spreadsheet as user interface
semantics through interaction
65.
66. themes and take-aways ...
data in context
heterogeneity and linking
value and values
ethics and empowerment
…. and please use my data