Skip to content

Linking of geographic data

Register data from Statistics Denmark can be supplemented with geographic data. This may include information about geographic areas (such as clusters and polygons) or distance information between a person’s residence and various institutions, such as hospitals, general practitioners, workplaces, or educational institutions.

Denmark’s Data Portal can provide guidance and solve tasks within, among others, the following areas:

  • Grid cells
  • Clusters
  • Polygons/preparation of maps
  • Distance calculations, including road distances
  • Catchment area analyses/population formations
  • Questions regarding statistical disclosure control
  • Validation of users’ geographic classifications

If you have a task that falls outside these categories, you are still welcome to contact us. We are happy to discuss possibilities regarding whether and how the task can be solved.

When such geographic data are linked to register data, they must be documented in DDV App and comply with Statistics Denmark’s specific requirements for statistical disclosure control of geographic data.

Examples of Tasks

Case 1: Catchment Area Analysis of Coastal Development

Some municipalities along the west coast of Jutland wish to examine how legislation from 2017, which enables coastal development, affects the area across a number of parameters, including:

  • Impact on property price development
  • Impact on the municipality’s population composition
  • Impact on businesses/jobs/institutional capacity
  • Risks of storm and water damage in line with coastal construction, including development/revision of risk zone classifications for buildings and infrastructure

Formation of clusters

The user was advised by Denmark’s Data Portal to use clusters defined based on distance parameters from the coast. The clusters were formed as constant distances from the coastline and subdivided into 0–500 m, 500–1000 m, and 1000–1500 m from the coast. A limited number of clusters were defined, as they were also delimited by municipal boundaries.

Possibilities for expanding the analysis

With information on which addresses belong to each cluster, the user was able to examine how the selected parameters developed within each cluster.

At the same time, this classification ensured that clusters had sufficient size to meet statistical disclosure control requirements, i.e. at least 50 households in each cluster.

Delivery time

The task was delivered as a tailored solution using existing data on municipalities and coastline positions in Denmark. In addition, data from the Danish Address Register (DAR) and Statistics Denmark’s population register (BEF) were used.

A task of this kind can typically be delivered within two weeks after clarification between Denmark’s Data Portal and the user.

Case 2: Geographic Clusters

Background

A user wishes to perform analyses based on populations in school districts across the country.

Process

The user submits a GIS file (in a geographic data format) with geographic clusters in the form of school districts (polygons/areas), which can be downloaded free of charge from a joint municipal data register. The user wants these converted into addresses that can be linked to register data via address IDs.

An address ID or address code is a key that uniquely identifies an address, either as a house number (access or entrance address) or a unit address (floor/apartment address). In practice, the address variable OPGIKOM is used, which together with KOM (municipality code) uniquely identifies access addresses in Denmark. OPGIKOM consists of road code and house number, including perhaps a letter.

OPGIKOM is available in other Statistics Denmark registers, making it possible to link, for example, the population register (BEF) to these school districts (via the key register BEFADR).

Analytical possibilities

Register data can be linked across registers using the CPR number and address ID, and geographic distribution of analysis parameters can be examined using cluster information.

Consultancy

Clarification of input data quality takes place before the task is carried out by Denmark’s Data Portal, and a time frame is agreed.

In the delivered school districts, small gaps were identified where no active school district was registered. There were also many overlapping districts, as some districts cover up to 6th grade, while others cover all primary school levels or only 10th grade, meaning they fully or partially overlap geographically.

As part of Denmark’s Data portals processing of geographic clusters, cluster size is also checked to ensure that no district contains fewer than 50 households (= occupied addresses). If districts are too small, they do not comply with Statistics Denmark’s data confidentiality policy.

Delivery time

A task of this type can normally be completed within a framework of 10 hours and delivered within two weeks.

Alternative Solutions

Many users apply geographic clusters based on grid cells, where 100x100 m cells are the smallest geographic units that can be linked to register data. Grid cells are aggregated in sparsely populated areas to meet the requirement of at least 50 households.

Statistics Denmark can provide a set of standard clusters based on grid cells.

In principle, there are no restrictions on which districts a user can apply for analysis, as long as they meet the requirement of at least 50 households. If business data are included in combination with geographic clusters, other minimum size requirements apply. These are determined in the specific project and depend on the scope and level of detail in the business data used.

Case 3: Distance Calculation Between Residences and Upper Secondary Education

Background

A user wants to measure the distance between young people’s residence and the upper secondary education institution they attend or have attended within a defined period.

Process

The user uploads a file containing the variables: city, address, and region for both residences and educational institutions. Input data quality is checked, and it is agreed whether distance calculations should be straight-line or road distance.

Depending on address format, address cleaning may be required, including adding coordinates necessary for distance calculations.

When completed, a SAS file is delivered with anonymised addresses for residences and institutions, along with distance in the desired unit (meters, kilometres).

Delivery time

Such tasks typically require between 15 and 30 hours, depending on scope and type of distance. Straight-line calculations require fewer hours than road distance calculations. Calculating distances between all residences and all institutions takes longer than limiting the calculation to, for example, the same region.

Tasks with Grid Cell Data

Grid cell data divides Denmark into fixed cells based on coordinates, meaning the structure remains constant over time, unlike administrative divisions such as municipalities or parishes.

Denmark’s Data Portal provides grid cell data at 100 m, 1 km, and 10 km resolution, linked with KOM and OPGIKOM (address11).

As a rule of thumb, grid data is most reliable from the municipal reform in 2007 onwards. It can also be used for earlier years, but coverage decreases the further back in time one goes.

Denmark’s Data Portal offers two options:

  1. Delivery of counts of households and individuals per grid cell from 1980 to the latest available year. These can be aggregated into clusters, after which a dataset is produced with cluster code, municipality, and anonymised OPGIKOM.
  2. Delivery of standard clusters. Based on the 100x100 m grid cells, Denmark’s Data Portal has developed standard clusters that can be linked to Statistics Denmark’s register data via KOM and OPGIKOM. These clusters are constructed according to the principle that they should be small, while still meeting statistical disclosure control requirements for all years in the time series for which the clusters can be applied. In addition, the clusters are composed of grid cells based on principles of proximity and population size within each cluster. As far as possible, the clusters are formed from grid cells that are adjacent to one another. However, exceptions may occur where clusters consist of grid cells that are not contiguous.

Special Requirements for Statistical Disclosure Control of Geographic Data

When working with geographic data on the research server, specific requirements apply regarding how data may be used and displayed. For all geographic divisions, including clusters and grid cells, each unit must contain at least 50 households in the nighttime population per register year.

If a project uses clusters across multiple years, this requirement must be met for every year, including newly added years. This may mean that previously delivered data can no longer remain in the project if they fail to meet requirements after updates.

Pricing for Geodata Tasks

Pricing for geodata tasks varies. When we receive an inquiry, we prepare a framework agreement with an estimated maximum time consumption based on factors such as task complexity and data quality. After delivery, billing is based on actual time spent. See more under Prices and price agreements.

Getting Started

If you would like assistance with a geodata task, please contact your Project owner at Statistics Denmark and describe the task and data.