The aim of this work is twofold: to investigate the possibilities of model-based approach implementation in the official statistics so to ensure reliability of data for social conditions by different breakdowns; and to discuss advantages, disadvantages, and the potentiality of use of small area estimation techniques and tools in production of the official statistics.
In order to try to analyse fitting of the models to different type of data, various run were conducted using several small area estimation techniques (such as empirical Bayesian, hierarchical Bayes, etc.) already built-in within the R software (packages sae, hbsae, etc.) to obtain area and unit level based at-risk-of-poverty estimates and the mean squared errors of the estimates
Reliability of estimates in socio-demographic groups with small samples
1. Reliability of estimates in socio-
demographic groups with small samples
D.Buono
Statistical Office of European Union
19 August 2016, SAE, Maastricht
All expressed opinions are of the author
2. Facts and figures about Eurostat
• About 800 people with 28 different nationalities
• Small central methodology team
• TS, Econometrics, SDC, research & EA
• Plus domain methodologists networking
• Statistical Office but not independent authority, General
Directorate of the European Commission
• Subsidiary principle!
3. Eurostat core business
• Euro-zone (19) & EU (28)
aggregates
• harmonization, best
practices, guidelines,
trainings & international
cooperation
4. Why interested in SAE?
• European regional policies
• Different sizes of Member States, primary data providers
• According to the EU 2011 Population Census there are
79,652,380 residents in DE and 512,353 in LU!!!
• Some dilemmas:
• How big is a small area?
• Can SAE help with data breakdown demand by users?
5. Outline
• Reliability of indicators
• At-risk-of-poverty indicators
• SAE techniques for Official Statistics
• Application for 2 EU countries
• Learnings and open questions
• ADS and EU research funds for SAE expertise
6. Some Notation
• U – finite population of size N
• D – number of socio-demographic groups in the target
population
• s – sample of size n
• sd – sub-sample from domain d of size nd
• r – not sampled elements of size N-n
• rd – not sampled elements from domain d of size Nd-nd
• y – target variable
• X – vector of auxiliary information
11. packages and functions used
• sae.R
• Functions:
direct
ebBHF
pbmseBHF
• hbsae.R
• Functions:
fSAE
fSAE.Area
12. Application: Target and data
• Target: Calculate direct and indirect at-risk-of-
poverty rate estimates by socio-demographic
breakdowns
• Data sources: Survey on Income and Living
Conditions (EU-SILC) and Census data of some EU
countries in 2011
• Sample: divided in 18 disjoint socio-demographic
groups of small and large sizes
• Auxiliary variables: unit level information on
economic activity status and highest level of
education attained
17. Learnings and future work
• By applying model-based SAE techniques reliability of
estimates could be increased
• Enlargement of number auxiliary variables
• Further investigation is needed to assess the most
appropriate estimator (call for harmonization?)
• Extension to additional countries and socio-demographic
groups
18. Open questions on SAE
• EB vs. HB dichotomy calls for harmonised practices in
Official Statistics?
• Design based to model based to algorithm based:
maybe there is a possible link between SAE and
statistical learning?
• Reversing the approach: starting from the data rather
than from the goal?
• How about the use of SAE for data protection?
19. Advertisement
CESS2016, Conference of European Statistics Stakeholders
Budapest, 20–21 Oct 16 (by ESTAT, ECB & HCSO), free!
• Session B3: Official statistics on cross-border phenomena
• Session C9: Small area estimation and weighting
NTTS2017, New Techniques and Technologies for Statistics
Brussels, 14–16 March 17 (by ESTAT), free!
• abstract by 28 Oct 16, track C includes SAE
20. Research funds under Horizon 2020
TOPIC : Towards a new growth strategy in Europe - Improved
economic and social measurement, data and official statistics
Opening: 4 of October 2016
Closing: 2 of February 2017
For more info here to submit a proposal here
"Disaggregation of statistics - geographically, or by other domains
(e.g. identifying vulnerable population groups) - to provide greater
insights and providing evidence allowing more focused policy
decisions should be covered. At the same time data protection
concerns should be addressed. Small Area Estimation expertise
could cover the geographical/domain disaggregation aspect"