Presentación de ISTAC y la ULL en la Conferencia Internacional de Estimación en Áreas Pequeñas de 2011 (SAE 2011), organizada en Alemania en en la Universidad de Trier como conferencia satélite del International Statistical Institute (ISI).
Enlace: http://www.uni-trier.de/index.php?id=30789
1. STATISTICAL INFRASTRUCTURE
EVALUATION OF DUAL-FRAME
ESTIMATORS
Application to the Survey on Equipment and Use of Information and
Communication Technologies in Households (ICT-H) in Canary Islands
EVALUATION OF DUAL-FRAME ESTIMATORS
2. STATISTICAL INFRASTRUCTURE
EVALUATION OF DUAL-FRAME
ESTIMATORS
Application to the Survey on Equipment and Use of Information and
Communication Technologies in Households (ICT-H) in Canary Islands
Enrique González-Dávila
University of La Laguna
Alberto González-Yanes
Canary Islands Statistical Institute, ISTAC
EVALUATION OF DUAL-FRAME ESTIMATORS
3. BACKGROUND:
• The Canary Islands consist of seven islands with differing characteristics.
They also represent two provinces (Las Palmas and S/C of Tenerife). The
islands have been considered units of level III (NUT III) thereby requiring
Eurostat data at them.
• Most of the surveys conducted by the National Statistics Institute of Spain
(INE) provide information at the Autonomous Community level.
• Additional information is relevant to the county and island level (27 counties).
• The main objective from the ISTAC is to provide an information system of
labour markets, new technologies, etc. in the Canary Islands to meet the
needs of users at the lowest possible cost and maximum efficiency.
EVALUATION OF DUAL-FRAME ESTIMATORS
4. BACKGROUND:
• The Canary Islands Statistical Institute (ISTAC) joins the working group on
small areas of National Statistical Institute (INE) since its inception in April
2004.
• Canaries in particular establishing a working group between the ISTAC and
the University of La Laguna, signing the project "CANAREA” with the idea of
meet the objectives to:
– Respond to the requirements of the working group on small areas of
National Statistical Institute and,
– Incorporate technical and methodological developments in the statistical
practice of ISTAC.
• In the first study we evaluated the use of small area estimates in the Labour
Force Survey (LFS). We proposed based-design adaptive estimators with
features similar to composite estimators, where all the information came
from the survey itself (3,510 dwellings with approximately 8,000 persons).
That work was presented at SAE 2009.
EVALUATION OF DUAL-FRAME ESTIMATORS
5. BACKGROUND:
• A brief overview of the work and
methodology for the final estimate of the
small areas in Canary Islands for LFS
can be found in the publication of the
ISTAC.
• Once worked in various aspects of small
area estimates on the Labor Force
Survey, we decided to adapt, if possible,
this methodology to a new survey:
Equipment and Use of Information and
Communication Technologies in
Households (ICT-H) in Canary Islands.
• The ICT-H survey is developed by the
National Statistics Institute of Spain
(INE) each year, providing information at
the Autonomous Community level.
• Additionally, ISTAC has developed a
similar survey in 2006 and 2010 with
the intention of provide information at
the islands level (ICT-H Canary).
EVALUATION OF DUAL-FRAME ESTIMATORS
6. BACKGROUND:
ICT-H Survey (INE): description of the survey
• Provides information of equipment and use of new technologies in household
at the Autonomous Community level.
• Follows a stratified three stage random sampling design in each province,
with primary sampling units the census sections, secondary units the
dwellings and tertiary units the people.
• 120 sections are sampled (8 dwellings by section, approximately 808
dwellings).
• Each year renews a quarter of dwellings (rotating panels from 2004).
• It uses direct estimates of reason with calibrated weights, wj .
• All households at the first visit are surveyed by personal interview (CAPI),
then those ones that have fixed telephone, in subsequent visits are surveyed
through telephone interviews (CATI).
• The variables of interest, among others, are: availability of fixed telephone,
mobile phone, only fixed telephone, desktop computer, portable computer,
internet, etc.
EVALUATION OF DUAL-FRAME ESTIMATORS
7. BACKGROUND:
ICT-H Canary Survey (ISTAC): description of the survey
• Similar to the ICT-H survey (INE) but it’s a light survey in the questionnaire
and provides information at the island level (performed only in 2006 and
2010).
• Follows a stratified three stage random sampling design.
• 180 sections are sampled (20 dwellings by section, approximately 3,500
dwellings).
• It uses direct estimates of reason with calibrated weights, wj .
• 70% of the survey is conducted through telephone interview (CATI) and
30% with personal interview.
EVALUATION OF DUAL-FRAME ESTIMATORS
8. BACKGROUND: questions
1. Can we use the INE survey information to lower level islands?
2. Is it necessary to use an auxiliary survey to provide information at the
islands or counties?
3. If we use a similar auxiliary survey to the one conducted by ISTAC: how
does the use of telephone interview affect in the direct estimation of the
target variables?
4. Assuming that the direct estimations of the target variables will be biased,
is it possible to maintain a high percentage of telephone interview (low
cost) to avoid or correct this bias?
5. How does the telephone interview affect the Dual-Frame estimation of
target variables, which can match, or have different degrees of association,
or be independent of the availability of phone?
EVALUATION OF DUAL-FRAME ESTIMATORS
9. QUESTION 1 and 2: Information at the island or county levels (only INE
survey)
Initially we took the ICT-H survey (INE) for 2006 and try to calculate estimates at islands
(using design-based estimators, as direct, synthetic and composite estimators). Comparing
the results with the estimates provided by the ICT-H Canary survey 2006 (ISTAC).
These initial assessments suggest the
need to borrow strength from other
sources of information (auxiliary surveys,
administrative records, etc.).
In particular, due to the existence ICT-H
Canary Survey, we opted for the use of
an auxiliary survey.
EVALUATION OF DUAL-FRAME ESTIMATORS
10. QUESTION 3: How does the telephone interview affect?
Conducting a survey is very expensive and the use of telephone interview
reduces the cost. Additionally, in this survey most of the target variables are
related to the availability of phone at home. To assess this:
•Build an artificial population of households in the Canary Islands, departing of
the Housing Census 2001 of that community, and generating the variables of
interest under different conditions.
•Simulate the extraction of surveys with the same methodology that ICT-H
Canary survey, being able to vary the degree of presence.
•For 500 simulations and different estimators considered, we obtain their
relative mean square errors and biases.
EVALUATION OF DUAL-FRAME ESTIMATORS
11. QUESTION 3: How does the telephone interview affect?
The target variables considered are generated using Bernoulli distribution with
different parameter.
•p1: probability of availability of fixed telephone.
•p2: probability of availability of desktop computer.
•We consider that the availability of computer is independent of the fixed
telephone. This allows us to evaluate the performance of an independent variable
to the type of interview.
•p3: probability of only fixed telephone for those with fixed telephone.
•The variable “only fixed telephone” enable us to evaluate the performance of a
variable contained entirely within the telephone framework.
•p4: probability of availability of internet connection for those with fixed
telephone.
•p5: probability of availability of internet connection for those with not fixed
telephone.
•The variable “internet” allows us to evaluate variables that are closely related to
the availability of fixed telephone, but that is not fully contained within the phone
framework.
EVALUATION OF DUAL-FRAME ESTIMATORS
12. QUESTION 3: How does the telephone interview affect
Computer
Telephone
Only fixed Telephone Internet
EVALUATION OF DUAL-FRAME ESTIMATORS
13. QUESTION 4: Dual-Frame Estimators
Dual-Frame methodology is applicable to multiple scenarios with different
configuratios, but we only introduce the situation of two scenarios with one
totally contained in the other. Let be A and B such frames, and in particular
• A: the total housing.
• B: households with fixed telephone.
Then independent samples of sizd na and nb considered for each frame
respectively with wiA and wiB the inverse of the inclusion probabilities in each
frame. We consider that survey is conducted by in-person interview in frame A
and by telephone interview in frame B.
In this case, it creates two mutually exclusive domains, a and ab, formed as: units
of A that are not in B, and units that are in both frame to time, respectively.
EVALUATION OF DUAL-FRAME ESTIMATORS
14. QUESTION 4: Dual-Frame Estimators
Hartley estimator:
Hartley (1972) proposed dual-frame estimators using weighted average of the
estimators in the overlap domain ab, as:
Where is the Y estimated for units in domain a, is the Y estimated in
domain ab the sample from frame A, from B, and 0≤ θ ≤ 1. Hartley proposed
choosing θ to minimize the variance of , as:
The calculation of variances and covariance can be complex and depends on the
type of sampling is performed.
EVALUATION OF DUAL-FRAME ESTIMATORS
15. QUESTION 4: Dual-Frame Estimators
Rao (1983) estimator:
Rao (1983) proposed maximum likeihood estimators using scale-load approach
as:
Where is the mean (or percentage) of distinct units in the
domain ab and d is the number of unit in both samples.
Its calculation is very simple but is highly influenced by the sample size. In
relatively small domains the accuracy of estimated percentages in a and ab are
quite low. In large domains the results are usually quite adequate.
EVALUATION OF DUAL-FRAME ESTIMATORS
16. QUESTION 4: Dual-Frame Estimators
Skinner and Rao (1996) estimator (PML Estimator):
Skinner and Rao (1996) proposed a pseudo-maximum likelihood (PML) estimator
consistent under complex desing as:
Where is the smallest root of the quadratic equation:
is consistent for Y for any values of na and nb, in particular, for the values of
the sample size of the frame A and B, respectively. The optimal choice suggested
by Skinner and Rao was to consider na/nb minimizing the variance of , but
it needs to estimate the variances of and . We consider the definition with
the sample sizes.
EVALUATION OF DUAL-FRAME ESTIMATORS
17. QUESTION 4: Dual-Frame Estimators
Skinner and Rao (1996) estimator (PML Estimator):
The calculation is relatively simple (such as was defined), and the behaviour of
this estimator has been quite adequate for all test applied.
Sharon Lohr (in Recent Developments in Multiple Frame Surveys, 2007) indicates
“Skinner and Rao (1996), Rao and Skinner (1999), Lohr and Rao (2006) found that
the PML estimator has small mean squared error and works well in a wide
variety of surveys designs”.
Additionally, you can define a new variable of weights and work with the typical
structure of a direct estimator usual in Statistical Institutes, as:
EVALUATION OF DUAL-FRAME ESTIMATORS
18. QUESTION 5: How does the telephone interview affect Dual-Frame
Estimators?
These three estimators were tested on the simulator introduced (500
simulations). For these tests it was considered that frame B was composed of all
households that had fixed phone. In practice, frame B will consist of homes that
are only in the phone book and would not necessarily coincide with all the homes
that have fixed telephone.
The results with any of the three estimators used were similar and we chose the
PML estimator for comparison with the direct estimator under different
configurations.
EVALUATION OF DUAL-FRAME ESTIMATORS
19. QUESTION 5: How does the telephone interview affect Dual-Frame
Estimators?
Telephone (coincident variable with frame B)
p1: probability of
availability of
fixed telephone
(size of frame B)
The PML Estimator
remains unbiased even
with very low percentage
of in-person interview,
when the variable to
estimate is highly
correlated with the frame.
EVALUATION OF DUAL-FRAME ESTIMATORS
20. QUESTION 5: How does the telephone interview affect Dual-Frame
Estimators?
Computer (independent variable of frame B)
p2: probability of
availability of
desktop computer
Both the PML and the direct estimator
remain unbiased. However when the
percentage of in-person interview is very
small (less than 20%) PML estimator
variance increases.
EVALUATION OF DUAL-FRAME ESTIMATORS
21. QUESTION 5: How does the telephone interview affect Dual-Frame
Estimators?
Only fixed telephone (fully contained variable in frame B) p3: probability of only fixed
telephone in household
with fixed phone
The PML estimator
remains unbiased and
fairly stable. By increasing
the frame B (greater p1)
the RMSE of both is
similar, but more PML
variance against more
direct bias.
EVALUATION OF DUAL-FRAME ESTIMATORS
22. QUESTION 5: How does the telephone interview affect Dual-Frame
Estimators?
Internet (related variable with frame B)
p4: probability of availability of internet connection in
households with fixed telephone.
p5: probability of availability of internet connection in
household with not fixed telephone.
Internet (p1=75%, p4=50%, p5=30%) = 45% global
Internet/Not
Fixed Phone
(p5=30%) Internet/Fixed phone
(p4=50%)
Not fixed phone Fixed phone (p1=75%)
(25%)
The PML estimator remains unbiased and fairly
stable. By decreasing p4 (more independent is the
variable respect to phone) the RMSE of both is
more similar. When the percentage of in-person
interview is very small (less than 20%) PML
estimator variance increases, being more serious
when p4 is smaller (more independent is the
variable respect to phone).
EVALUATION OF DUAL-FRAME ESTIMATORS
23. CONCLUSIONS:
• Dual-Frame estimators, including the PML estimator, avoid the bias of the
direct estimates from surveys that combine in-person and telephone
interview.
• The PML estimator remains unbiased for all percentages of in-person
interview, but increases the variability when the percentage decreases. In
particular, for rates below 20% this increase can be severe, especially in
independent variables or with a low degree of relationship with the
availability of telephone.
• In general, the values of RMSE in PML estimators are lower than those
obtained with the direct estimator, with more variability but unbiased.
• Tests on real data from the ICT-H 2010 survey (ISTAC) at the Autonomous
Community show a significant correction with respect to the values published
by ICT-H 2010 survey (INE).
EVALUATION OF DUAL-FRAME ESTIMATORS