(function(doc, html, url) { var widget = doc.createElement("div"); widget.innerHTML = html; var script = doc.currentScript; // e = a.currentScript; if (!script) { var scripts = doc.scripts; for (var i = 0; i < scripts.length; ++i) { script = scripts[i]; if (script.src && script.src.indexOf(url) != -1) break; } } script.parentElement.replaceChild(widget, script); }(document, '

The trade-off between the risk of disclosure and data utility in SDC: a survey of accidents at work

What is it about?

One of the key problems associated with Statistical Disclosure Control is ensuring an optimal trade-off between minimizing the risk of unit identification and maximizing the utility of data to be disseminated (which means minimizing information loss due to the application of SDC methods). In practice, it is usually achieved by defining how much risk can be accepted for any given unit, and then doing the best to modify the data set so that the risk is below the preset threshold while maximising the utility. Moreover, variables from statistical surveys vary not only in terms of their measurement scale but also as regards the role they play in the SDC process. All these aspects should therefore be taken into account when one tries to find this trade-off. In the paper we present a way of assessing whether an optimal trade-off has been achieved. Two main aspects of measuring the risk of disclosure are discussed. The first one is internal risk, i.e. the risk of disclosing confidential information only on the basis on disseminated microdata after the application of SDC (i.e. no attempt of combining data with external information is made); the second one is external risk, when the user has access to an alternative data set containing information that can be linked with statistical data in order to identify a unit. We show that it is possible to measure external risk and information loss while accounting for the measurement scale of variables. In our empirical study we used data from an annual survey of accidents at work for 2017. We compared complex information loss and the risk of disclosure in the original data files and those subjected to SDC using methods implemented in the new working version of the sdcMicro R package. We present the underlying assumptions and results of the SDC process, highlighting the benefits and drawbacks of the tools used in the study, which was conducted in 2020 and 2021 in the Centre for Small Area Estimation at the Statistical Office in Pozna´n.

Why is it important?

The assessment of disclosure risk and information loss due to suppression/perturbation for protection of statistical confidentiality is crucial for ensuring both proper safety and high utility of microdata disclosed to various users (of which scientists) for studies and analyses. We propose now some complex measure of information loss (taking all measurement scales of variables into account and based on comparing data files before and after the whole statistical disclosure control), investigate possibilities of assessing disclorure risk in a similar way (based on simulation) and show how these solutions can be applied in practice for autput data from a given statistical survey.

Read more on Kudos…
The following have contributed to this page:
Andrzej Młodak
' ,"url"));