5. TOAR Near Realtime Data Processing

Currently we collect near real-time data from two data providers: UBA (German Environment Agency 1 ) and OpenAQ (open air quality data 2). The corresponding data harvesting procedures are described below.

5.1. UBA Data Harvesting

Since 2001, the German Umweltbundesamt - UBA 1 - provides preliminary data from a growing number (currently 1004) of German surface stations. Basis for the data exchange is the manual „Luftqualitätsdaten- und Informationsaustausch in Deutschland“, Version V 5, April 2019 (in German).

At least ozone, SO2, PM10, PM2.5, NO2 and CO data for the current day are updated daily and provided continuously hourly up to a maximum of four previous days. Data is fetched from the UBA service 4 times per day (8 am,12 pm, 18 pm, and 22 pm (local time)).

The software for processing the data from UBA is available at https://gitlab.version.fz-juelich.de/esde/toar-data/toar-db-data/-/tree/master/toar_v2/harvesting/UBA_NRT. Data (StationparameterMeta.csv, StationMeta.csv, uba_%s.csv (%s denotes a date)) are harvested 4-times daily from http://www.luftdaten.umweltbundesamt.de/files/ (secured with access credentials).

_images/uba-snapshot.png

Fig. 5.1 Snapshot from 2020-09-05 17:00 CEST

Table 5.1 Mapping of data from daily files imported to the TOAR database variables

name of component in original file

name of component in TOAR database

Schwefeldioxid

so2

Ozon

o3

Stickstoffdioxid

no2

Stickstoffmonoxid

no

Kohlenmonoxid

co

Temperatur

temp

Windgeschwindigkeit

wspeed

Windrichtung

wdir

PM10

pm10

PM2_5

pm2p5

Relative Feuchte

relhum

Benzol

benzene

Ethan

ethane

Methan

ch4

Propan

propane

Toluol

toluene

o-Xylol

oxylene

mp-Xylol

mpxylene

Luftdruck

press

Table 5.2 Mapping of station_type

term of station_type in original file

term of station_type in TOAR database

Hintergrund

background

Industrie

industrial

Verkehr

traffic

Table 5.3 Mapping of station_type_of_area

term of station_type_of_area in original file

term of station_type_of_area in TOAR database

ländlich abgelegen

rural

ländliches Gebiet

rural

ländlich regional

rural

ländlich stadtnah

rural

städtisches Gebiet

urban

vorstädtisches Gebiet

suburban

Table 5.4 Mapping of units and unit conversions

component

original unit

unit in TOAR DB

unit conversion while ingesting

co

mg m-3

ppb

858.95

no

ug m-3

ppb

0.80182

no2

ug m-3

ppb

0.52297

o3

ug m-3

ppb

0.50124

so2

ug m-3

ppb

0.37555

benzene

ug m-3

ppb

0.30802

ethane

ug m-3

ppb

0.77698

ch4

ug m-3

ppb

1.49973

propane

ug m-3

ppb

0.52982

toluene

ug m-3

ppb

0.26113

oxylene

ug m-3

ppb

0.22662

mpxylene

ug m-3

ppb

0.22662

pm1

ug m-3

ug m-3

pm10

ug m-3

ug m-3

pm2p5

ug m-3

ug m-3

press

hPa

hPa

temp

degree celsius

degree celsius

wdir

degree

degree

wspeed

m s-1

m s-1

relhum

%

%

Validated data from the previous year is available at May 31st latest. This data is requested by email and then processed from the database dumps we receive. The validated data will supersede the preliminary near realtime data. The realtime data remains in the database but is hidden from the standard user access procedures via the data quality flag settings.

5.2. OpenAQ

OpenAQ 2 is collecting data in 93 different countries from real-time government and research grade sources. Starting on 26th November 2016, OpenAQ has already gathered more than one billion records, which has 306 Gigabyte in total size and covers the air quality relevant variables BC, CO, NO2, O3, PM10, PM2.5 and SO2.

We began working on OpenAQ data about 6 years ago, but this has always been a rather difficult issue due to unsuitable data structures and metadata and unknown data quality. After an unsuccessful first attempt to set up a processing pipeline, we redid much of this work again about 2 years back. We then concentrated on those world regions where OpenAQ would give us the only information about air quality that we could get our hands on. If I am not mistaken (Sabine might know better), we never included all OpenAQ data, for example from Europe. We did not have the time nor capacity to rigorously inspect the OpenAQ time series and always expressed clearly that this data review must be undertaken by the TOAR scientists. However, your observations concerning limited data coverage and oftentimes questionable data quality agrees with our non-systematic analysis.

Footnotes

1(1,2)

https://www.umweltbundesamt.de/en

2(1,2)

https://www.openaq.org