# Accuracy and reliability

## Contact info

Labour and Income, Social StatisticsMartin Faris Sawaed Nielsen

+45 39 17 34 98

## Get as PDF

The Labor Force Survey (LFS) has a relatively large sample and there are continuous improvements in enumeration methods. This provides reliable statistics for the population's connection to the labor market, although there is uncertainty linked to the selection of the sample and the structure of the non-response.

In Q1 2016, the response rate was exceptionally low, creating greater uncertainty about the figures. Furthermore, web interview (CAWI) has been introduced as a new data collection method. The two factors created breaks in the time series. The breaks are corrected on the main series.

### Overall accuracy

There is always a certain degree of uncertainty connected to the LFS due to the sampling error that is always connected to surveys (see Sampling error). The response rate was 45 pct. in the 1st quarter of 2016. This is significantly lower than for the period 2007-2015.

As is the case with all survey-based statistics there is some uncertainty. This is due to the way the sample is selected and the structure of the non-response. Non-response is when an interview is not completed with a selected person. Non-response increases the uncertainty of the survey since the probability to attain an interview with all is not equal. In other words some groups are more likely to be non-respondents, which make an impact on the representativeness in the survey. This is handled to a large extent through the weighting and the use of register-based auxiliary information. These are used for the weighting and calibration, where persons, who are typically underrepresented in surveys, will get a higher weight. On the contrary persons who are overrepresented will get a lower weight, which adjusts the numbers of persons downwards. An example on bias in the non-response is educational level, where persons with a higher educational level are more likely to participate compared to persons with lower educational level. Other biases are age, where young persons aged 15-24 and persons with another ethnic background and unemployed are underrepresented.

The so-called Gi-weights are the calibration factors. This is the weight that adjusts the non-response. The mean of the Gi-weights for the Danish LFS is 0.99. This is very close to 1, which means that our Gi weights calibrates in a correct manner. The Standard deviation on the Gi weights is 0.36. This means, assuming a 95 pct. confidence interval, 95 pct. of the Gi weights would be placed between 0.27 and 1.71, meaning that the weights typically upgrades with a weight on 171 pct. in the high end of the confidence interval and downgrades with 73 pct. in the low end of the confidence interval. This means that 95 pct. of the respondents will get a weight that will adjust them downwards with 73 pct. or adjust them upwards with 171 pct. An example is that young persons with a low educational level having another ethnic background who are unemployed will get a high weight and will be adjusted upwards with many percentages. On the contrary persons who are having a Danish background, aged 35-44 with a higher level of education and employed will be adjusted downwards since they are overrepresented.

Outside the 95 pct. confidence interval the maximum weight is 3.87, an upgrade with 387 pct. and the minimum weight is 0.0879, which means a downgrade with over 99.9 pct. This would either be persons extremely rare, since they are upgraded that much in pct. or they are extremely common since they are downgraded with such an amount of pct.

Even though the auxiliary information handles a lot of bias, the possibility of systematic bias can’t be excluded. However, this would only impact the level and not the development. Due to low response rate for the first quarter 2016, it is possible that the systematic bias has changed and affected data and thus the historical development. This could contribute to the break in time series that occurs in the 1st quarter of 2016.

The earliest data from the Danish LFS from 1994-1999 is of a lower quality than data from 2000 and onwards, which among other things is due to the lack of a personal identification number.

### Sampling error

Sampling errors are a matter of concern especially for small observations. Consequently published results are always disseminated rounded to the nearest 1,000 persons. Furthermore, some of the results are based on annual averages to increase the number of interview responses and from that derive more reliable results.

Besides this some of the results are complemented with information of the corresponding standard errors, illustrated by intervals of confidence in the following way: +/- sampling error (interval of confidence). The sampling error is calculated as 1.96*standard error and 1.96 corresponds to the 95th percentile in the standardized normal distribution. The sampling error depends on the sample size. For example, the sampling error for estimates is approximately halved when the sample size is doubled by four. Therefore, in several cases it will be an advantage to use data from the last four quarters instead of only the present one.

This enables the user to assess to what extent, e.g. a change in the level of employment is merely a result of the corresponding sampling error, or a significant decrease or increase. To give a description of the corresponding sampling error for small or large groups in a survey, intervals of confidence are often applied rather than standard errors of variances. In the Danish Labour Force Survey it has been decided to apply intervals of confidence at a 95 significance level. This means: if the survey was repeated 100 times, in 95 out of 100 cases the estimate would be bounded by this interval, while only in 5 cases the estimate would range above or beneath these limits.

Another measure of sampling error is coefficient of variation (CV) that reflects standard deviation as a share of the estimate. The response rate was unusually low in the 1st quarter of 2016 which will increase the sampling error. However the research protection was removed in the 1st quarter of 2016 and persons that earlier had research protection could thereby be interviewed. This could be expected to lead to a decrease in the sampling error. The CV’s were however in general marginally higher than normal in the 1st quarter of 2016. The CV’s were 0.0041 and 0.0351 for respectively employment and unemployment and the intervals of confidence were +/- 22.000 for employment and +/- 13.000 for unemployment. This can be compared with the 4th quarter of 2015, which had a normal non-response rate and where the CV’s were 0.0038 and 0.0328 for respectively employment and unemployment which gave an interval of confidence of +/- 20.000 for employment and +/- 11.000 for unemployment.

Due to the sampling errors the published figures are not under 4000 weighted persons quarterly and not under 2000 persons yearly.

**Sampling error - indicators**

Coefficient of variation (CV): The coefficient of variation shows the result of the variability of data divided by the mean. The variability of data tells how far the values are from the mean. The coefficient of variance is thus an estimate of the size of the variability - a method of presenting statistical confidence.

The Labor Force Survey publishes coefficients of variance for employed and unemployed.

### Non-sampling error

Every quarter a sample is drawn from the population register.

From January 1st 2021 the LFS is adapted to a new EU framework regulation. Until the year 2020 the LFS has been collected at the individual level for 15-74-year-olds. From 2021 the population has changed to also include the age group from 75-89 years. On a quarterly basis the sample has thus increased from 34,320 people to 36,020 people. The weighting scheme has also been changed to also include the age group 75-89 years.

The research protection was removed in the LFS in 2016. Around 13 pct. of the sample had research protection and could therefore not be contacted. The removal of the research protection has led to that a larger share of the sample can be contacted which in itself will reduce the unreliability.

Each quarter a sample size of 40,532 people is selected from the Population Register. Since the first quarter of 2016, the sample is reduced and will therefore go from 40,532 persons to 34.320 persons aged 15-74 years in the 2nd quarter of 2017, when the reduction is fully implemented. Equivalent to other surveys based on sample sizes the results of the survey have some sampling errors attached. The sampling errors are related to the sample selection and the patterns of non-response. Non-response occurs when an interview with a selected person is not carried out. Non-response increases the inaccuracy rate because the probability of conducting an interview with all selected people is uneven. In other words, it is the same kind of sections of the populations where interviews are not being carried out at the same extent as other sections of the population. Consequently the level of representativeness is affected.
The non-response in the Danish LFS is relatively large. This is handled by an advanced weighting scheme drawing on auxiliary information from registers (see our paper on our theory behind the weighting scheme here). One should be aware of four revisions in the method of weighting: 2003, 2007, 2011 and 2015. In connection with the method of weighting in 2011, data going back to 2007 were revised. The present method of weighting was implemented in Q3 2015 and the method now includes a weighting method based on the panels. The new weighting method led to marginal changes in the data , and therefore the data were not revised back in time. This latest revision is used in analyses of changes of levels caused by the method of weighting.

See more about weighting methods [here] (https://www.dst.dk/en/Statistik/dokumentation/metode/aku-arbejdskraftundersoegelsen).

Even though the weighting scheme handles bias, there will still be bias on a few sub-groups, for example it is known that we overestimate the employment rate of persons with another ethnical background Some variables can be hard to collect through surveys, since respondents are not necessarily aware of their objective position, especially when it comes to know ones occupation and industry. Due to low response rate for the first quarter 2016 there are changes in the none response rate, which increases the uncertainty.

**Non response**

Based on the unweighted quarterly sample, the rates of response and non response are calculated quarterly. The non response consists of persons with whom it has not been possible to obtain contact with, who were too ill or disabled to participate, and persons who refused to participate.

### Quality management

Statistics Denmark follows the recommendations on organisation and management of quality given in the Code of Practice for European Statistics (CoP) and the implementation guidelines given in the Quality Assurance Framework of the European Statistical System (QAF). A Working Group on Quality and a central quality assurance function have been established to continuously carry through control of products and processes.

### Quality assurance

Statistics Denmark follows the principles in the Code of Practice for European Statistics (CoP) and uses the Quality Assurance Framework of the European Statistical System (QAF) for the implementation of the principles. This involves continuous decentralized and central control of products and processes based on documentation following international standards. The central quality assurance function reports to the Working Group on Quality. Reports include suggestions for improvement that are assessed, decided and subsequently implemented.

### Quality assessment

The unusually low response rate in the 1st, 2nd, and 3rd quarter of 2016 makes the quality lower than usual.

As is the case with all survey-based statistics there is uncertainty. This is due to the way the sample is selected and the structure of the non-response. Non-response is when an interview is not completed with a selected person. Non-response increases the uncertainty of the survey since the probability to attain an interview with all is not equal. In other words some groups are more likely to be non-respondents, which make an impact on the representativeness in the survey. This is the case for groups like unemployed persons, persons with a shorter education, ethnic minorities and young person aged 15-24. This is handled to a large extent through the weighting and the use of register-based auxiliary information. These are used for the weighting and calibration, where persons, who are typically underrepresented in surveys, will get a higher weight.

Even though the auxiliary information handles a lot of bias, the possibility of systematic bias can’t be excluded. However this would only impact the level and not the development.

The Danish LFS is collected on individuals and not households, which is the most common method in the other European countries. On the other hand this means, that Denmark has a much lower share of so-called proxy interviews. They are interviews where one person of the household answers the survey on the behalf of another household member. This is a quality issue, which is not very significant in Denmark. The share of proxy in Denmark is in total around 5-6 pct. It is worth noting that for the persons aged 15-24 the proxy share is much higher around 10-15 pct.

The earliest data from the Danish LFS from 1994-1999 is of a lower quality than data from 2000 and onwards, which among other things is due to the lack of a personal identification number.

### Data revision - policy

Statistics Denmark revises published figures in accordance with the Revision Policy for Statistics Denmark. The common procedures and principles of the Revision Policy are for some statistics supplemented by a specific revision practice.

### Data revision practice

The Labour Force Survey for Q4 2020 and Q2 2021 has the 26th of November been revised. The revision is due to an interviewer’s irregularities at the external data provider for Statistics Denmark. The employment rate for Q4 2020 changes for the 15-64-year-olds with the revision from 74.7 to 74.8 per cent, and for Q2 2021 from 75.4 to 75.6 per cent. With the revision, the LFS unemployment rate is unchanged for the two quarters, while the group of 15-64-year-olds outside the labour force for Q4 2020 is revised from 20.6 to 20.5 per cent and for Q2 2021 from 20.8 to 20.6 per cent.

Only final figures are published.