Looking at distributions

Looking at distributions#

One essential element of a reanalysis is that we have gridded data. This means that we have data for each grid cell at every point in time. This is a very different situation than observational time series, which consist of a single time series from a single point that may not be complete in time.

import xarray as xr
import matplotlib.pyplot as plt
import numpy as np

We start by defining the location of the DANRA dataset and then load it with xarray.

xarray works especially well with the zarr format, enabling lazy loading. This means you can analyze massive datasets interactively without loading everything into memory at once, making your workflow both efficient and scalable.

ds_danra_sl = xr.open_zarr(
    "s3://dmi-danra-05/single_levels.zarr",
    consolidated=True,
    storage_options={
        "anon": True,
    },
)

ds_danra_sl.attrs['suite_name'] = "danra"
ds_danra_sl

As can be seen, 10-meter wind speed is not defined in the dataset, but it can easily be calculated from the u and v components using the formula:

\[ \text{wind\_speed} = \sqrt{u^2 + v^2}, \]

where \(u\) and \(v\) are the zonal and meridional wind components at 10 meters, respectively.

ds_danra_sl['wind_speed'] = np.sqrt(
    ds_danra_sl['u10m'] ** 2 + ds_danra_sl['v10m'] ** 2)
ds_danra_sl

<xarray.Dataset> Size: 9TB
Dimensions:          (time: 87360, y: 589, x: 789)
Coordinates:
    lat              (y, x) float64 4MB dask.array<chunksize=(256, 256), meta=np.ndarray>
    lon              (y, x) float64 4MB dask.array<chunksize=(256, 256), meta=np.ndarray>
  * time             (time) datetime64[ns] 699kB 1990-09-01 ... 2020-07-24T21...
  * x                (x) float64 6kB -1.999e+06 -1.997e+06 ... -2.925e+04
  * y                (y) float64 5kB -6.095e+05 -6.07e+05 ... 8.58e+05 8.605e+05
Data variables: (12/28)
    cape_column      (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    cb_column        (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    ct_column        (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    grpl_column      (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    hcc0m            (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    icei0m           (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    ...               ...
    t2m              (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    u10m             (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    v10m             (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    vis0m            (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    xhail0m          (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
    wind_speed       (time, y, x) float64 325GB dask.array<chunksize=(256, 256, 256), meta=np.ndarray>
Attributes:
    description:  All prognostic variables for 30-year period on reduced levels
    suite_name:   danra

xarray.Dataset

Dimensions:
- time: 87360
- y: 589
- x: 789

Coordinates: (5)

lat

(y, x)

float64

dask.array<chunksize=(256, 256), meta=np.ndarray>

long_name :: latitude
standard_name :: latitude
units :: degrees_north

	Array	Chunk
Bytes	3.55 MiB	512.00 kiB
Shape	(589, 789)	(256, 256)
Dask graph	12 chunks in 2 graph layers
Data type	float64 numpy.ndarray

lon

(y, x)

float64

dask.array<chunksize=(256, 256), meta=np.ndarray>

long_name :: longitude
standard_name :: longitude
units :: degrees_east

	Array	Chunk
Bytes	3.55 MiB	512.00 kiB
Shape	(589, 789)	(256, 256)
Dask graph	12 chunks in 2 graph layers
Data type	float64 numpy.ndarray

time

(time)

datetime64[ns]

1990-09-01 ... 2020-07-24T21:00:00

array(['1990-09-01T00:00:00.000000000', '1990-09-01T03:00:00.000000000',
       '1990-09-01T06:00:00.000000000', ..., '2020-07-24T15:00:00.000000000',
       '2020-07-24T18:00:00.000000000', '2020-07-24T21:00:00.000000000'],
      dtype='datetime64[ns]')

x

(x)

float64

-1.999e+06 ... -2.925e+04

array([-1999248.193226, -1996748.193226, -1994248.193226, ...,   -34248.193226,
         -31748.193226,   -29248.193226])

y

(y)

float64

-6.095e+05 -6.07e+05 ... 8.605e+05

array([-609541.209133, -607041.209133, -604541.209133, ...,  855458.790867,
        857958.790867,  860458.790867])

Data variables: (28)

cape_column

(time, y, x)

float64