Developer Data Modeling Mistakes: From Postgres to NoSQL
Qs4 group c corti
1. All about the UK Data Service
Louise Corti
UK Data Service
UK Data Archive
University of Essex
UKSG 37th Annual Conference,
Harrogate
14-16 April 2014
2. What is the UK Data Service?
• a comprehensive resource funded by the
Economic and Social Research Council
(ESRC)
• a single point of access to a wide range
of secondary social science data
• support, training and guidance
throughout the data life cycle
• listen to our recorded webinars at
http://ukdataservice.ac.uk/news-and-
events/videos.aspx
4. What does the UK Data Service do?
• put together a collection of the most valuable data and
enhance these over time
• preserve data in the long term for future research
purposes
• make the data and documentation available for reuse
• provide data management advice for data creators
• provide support for users of the service
• information about how data are used
• easy access through website
5. Who is it for?
• academic researchers and students
• government analysts
• charities and foundations
• business consultants
• independent research centres
• think tanks
• citizen scientists, where skills enable analysis
6. Our data portfolio
UK Surveys InternationalLongitudinal
Large-scale
government
funded surveys
Census Business
Major UK
surveys following
individuals over
time
Multi-nation
aggregate
databanks and
survey data
Range of
multimedia
qualitative data
sources
Census data
1971 – 2011
Microdata and
administrative
data
Qualitative
8. UK survey series
• high quality repeated cross-sectional surveys
• Individual or household level data
• cover many topics including health, work, crime, social
attitudes, family expenditure, living costs, housing etc.
• Labour Force Survey
• British Crime Survey
• Health Survey for England
• British Social Attitudes
• Annual Population Survey
….
10. Longitudinal studies
• British Household Panel Survey and Understanding
Society
• Understanding Society (2009-)
• English Longitudinal Study of Ageing
• Families and Children Study
• Growing Up in Scotland
• Longitudinal Study of Young People in England
11. International macrodata
• time series data aggregated to
country/region
• International governmental
organisations (IMF, OECD, IEA, World
Bank)
• wide range of socio-economic topics
• regularly updated
• currently limited to UK HE/FE
institutions
• World Bank data are open access
12. UK census data
• 1971-2011 census data
• baseline for other statistics
• detailed combinations of characteristics
• small geographies
• Census outputs
• aggregate data
• boundary data
• flow data
• microdata
• aggregate data is open access
• some restricted to UK HE/FE
13.
14. Business data
• Collected through a wide range of surveys, and
administrative sources:
• productivity
• innovation
• workforce skills
• earnings
• international trade
• foreign direct investment
• research and development
• business demography
• industrial relations
• Largely collected using the sampling frame of the Inter-
Departmental Business Register
15. Qualitative data
Qualitative data in a number of different formats: interview
transcripts, visual data, focus groups, essays, diaries, online
data, observation notes, documents, audio data, open-
ended survey questions, case notes etc.
Examples of sociology data collections:
• Family Life and Work Experience before 1918, Middle and Upper
Class Families in the Early 20th Century, 1870-1977 (SN 5404)
• Gender Difference, Anxiety and the Fear of Crime, 1995 (SN 4581)
• Mothers Alone: Poverty and the Fatherless Family, 1955-1966 (SN
5072)
• Affluent Worker in the Class Structure, 1961-1962 (SN 6512)
16. Where are the data from?
• official agencies - mainly central government
• international statistical time series
• individual academics - research grants
• market research agencies
• public records/historical sources
• access to international data via links with other data
archives worldwide
18. Some statistics about our Service
Data for research and teaching purposes, used in all
sectors and by many different disciplines
• 6,000 datasets in the collection
• 400 new datasets and new editions added
within last 12 months
• 25,000 registered users
• 60,000 downloads worldwide per annum
• 4000+ user support queries per annum
19. How to search for data?
discover.ukdataservice.ac.uk/
22. Data access
• web access to data and metadata
• data are freely available for use by all. Charges may
apply for commercial use
• data available under 3 access levels: open, safeguarded
and controlled
•
• data supplied in a variety of formats
• statistical package formats (e.g. SPSS, Stata)
• databases and spreadsheets
• word processed documents, PDF documents etc.
• some data also available via online data browsing
23. Accessing data – step by step
• register with us via UK Federation using local credentials
(we also issue Federation accounts)
• agree to an End User Licence (EUL)
• appropriate data usage
• full citation of data and informing us of re-use
• select data from the Discover Data Catalogue using
‘Download/Order’ button
• where data are safeguarded - specify a project for which
the data is to be used
• download data to local machine in preferred format
24. Open data collections
Census - Open Government Licence
• InFUSE - 2011 and 2001 Census aggregate statistics
Survey data - Open Government Licence
• Nesstar - cutdown teaching datasets
Qualitative datasets – CC4 BY NC
• QualiBank - life story interviews, essays, WWII reports
Aggregate global indicators – bespoke open data license
• .STAT - World Bank Millennium Development goals
25. Online instant data browsing
Nesstar social surveys
.stat aggregate global indicators
InFUSE aggregate census data
QualiBank qualitative data
26. Online analysis using Nesstar
• browse detailed information (metadata) and
data online
• do simple data analysis and visualisation on microdata
• bookmark analysis
• download the appropriate subset of data in one
of a number of formats (e.g. SPSS, Excel)
32. .stat: UN COMTRADE, 2008
French snail imports
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
G
reeceR
om
aniaTurkeyPolandB
elgiumH
ungary
C
zech
Rep.
IndonesiaLithuania
B
osnia
HerzegovinaC
yprusB
ulgaria
Italy
M
adagascar
Syria
U
nited
K
ingdom
Tradevalue,US$thousands
Graph: Celia Russell
35. What do users do with the data ?
• Comparative research, restudy or follow-up study
• Re-analysis/secondary analysis
• Research design and methodological advancement
• Replication of published statistics
• Teaching and learning
36. Evidence of access and re-use
User access information
• collect user information and ‘projects’ upon registration
• collate data and documentation download statistics
• users can share project information for others to see
• report data access stats on demand
Usage information
• email all users every 6 months after registration about activity
• manually add all research outputs references to the data record
• reporting rate of publications is poor!
• prior to DOIs, scanned citation literature for dataset
mentions – very manual and unreliable, and poorly cited
37. Impactful case studies of use
• Identify and seek out case studies of re-use: research or
teaching.
• Very successful!
• 140 case studies in our database
• can help provide impact stories for data owners/producers
and users
• and can inspire others!
• some are harvested by ESRC for their website
• often include ongoing work – no need to wait for
publications
38.
39.
40.
41. Making our data citable
• Use APA citation style for data
• DateCite DOIs for our collections (over 6000)
• Robust version control methodology using jump page
43. Citation: raising awareness in the social
sciences
• ESRC funding for short-term project on citation
• Advocacy for best practice
• Audiences
• professional organisations
• academic publishers and journal editors
• researchers and postgraduates
• Key activities
• data citation principles for social sciences
• outreach and personal communications
• Some way to go!
45. Expert advice on managing and sharing
• Supporting the ESRC Data Policy since 1995
• Advise and support ESRC grant applicants and award
holders
• Write guidance for applicants and Data Management
Planning (DMP) reviewers
• Provide detailed training
• Provide self-deposit repository environment
46. Data sharing – a shared responsibility
• Funders: provide policies, mandates and some infrastructure
funding
• Researchers: create, manage and use data
• Departments/centres: provide local support and some
infrastructure
• Institutions: provide a supporting framework
• grant-application and funding support
• research integrity framework
• IT and data storage facilities
• Data management guidance and training
• Clarify roles and responsibilities early on
47. Our managing and sharing data resources
• Online best practice guidance: ukdataservice.ac.uk/manage-data
• Managing and Sharing Research Data – a Guide to Good Practice:
www.uk.sagepub.com/books/9781446267264 (SAGE Publications)
• Helpdesk for all queries: ukdataservice.ac.uk/help/get-in-touch.aspx
• Training programme
48. UK Data Archive - digital data preservation
experts
• certified to ISO27001 for Information Security
• Data Seal of Approval (DSA) accredited
• undertake long-term data curation and preservation
• deeply involved in international preservation planning
and accreditation activities
www.data-archive.ac.uk/curate
50. User support and resources
• help desk, individual user support
• promotional events/ workshops
• webinars
• case studies
• teaching data and resources
• user guides/ thematic guides
• online data analysis
• advice on creating and managing data
51. Keep connected
• Subscribe to UK Data Service list:
www.jiscmail.ac.uk/cgi-
bin/webadmin?A0=UKDATASERVICE
• Follow UK Data Service on Twitter: @UKDataService
• Facebook
• Youtube: www.youtube.com/user/UKDATASERVICE