datatree.DataTree.dropna#

DataTree.dropna(dim: collections.abc.Hashable, *, how: Literal['any', 'all'] = 'any', thresh: int | None = None, subset: collections.abc.Iterable[collections.abc.Hashable] | None = None) Self[source]#

Returns a new dataset with dropped labels for missing values along the provided dimension.

Parameters
  • dim (hashable) – Dimension along which to drop missing values. Dropping along multiple dimensions simultaneously is not yet supported.

  • how ({"any", "all"}, default: "any") –

    • any : if any NA values are present, drop that label

    • all : if all values are NA, drop that label

  • thresh (int or None, optional) – If supplied, require this many non-NA values (summed over all the subset variables).

  • subset (iterable of hashable or None, optional) – Which variables to check for missing values. By default, all variables in the dataset are checked.

Examples

>>> dataset = xr.Dataset(
...     {
...         "temperature": (
...             ["time", "location"],
...             [[23.4, 24.1], [np.nan, 22.1], [21.8, 24.2], [20.5, 25.3]],
...         )
...     },
...     coords={"time": [1, 2, 3, 4], "location": ["A", "B"]},
... )
>>> dataset
<xarray.Dataset>
Dimensions:      (time: 4, location: 2)
Coordinates:
  * time         (time) int64 1 2 3 4
  * location     (location) <U1 'A' 'B'
Data variables:
    temperature  (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3

# Drop NaN values from the dataset

>>> dataset.dropna(dim="time")
<xarray.Dataset>
Dimensions:      (time: 3, location: 2)
Coordinates:
  * time         (time) int64 1 3 4
  * location     (location) <U1 'A' 'B'
Data variables:
    temperature  (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3

# Drop labels with any NAN values

>>> dataset.dropna(dim="time", how="any")
<xarray.Dataset>
Dimensions:      (time: 3, location: 2)
Coordinates:
  * time         (time) int64 1 3 4
  * location     (location) <U1 'A' 'B'
Data variables:
    temperature  (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3

# Drop labels with all NAN values

>>> dataset.dropna(dim="time", how="all")
<xarray.Dataset>
Dimensions:      (time: 4, location: 2)
Coordinates:
  * time         (time) int64 1 2 3 4
  * location     (location) <U1 'A' 'B'
Data variables:
    temperature  (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3

# Drop labels with less than 2 non-NA values

>>> dataset.dropna(dim="time", thresh=2)
<xarray.Dataset>
Dimensions:      (time: 3, location: 2)
Coordinates:
  * time         (time) int64 1 3 4
  * location     (location) <U1 'A' 'B'
Data variables:
    temperature  (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3
Returns

Dataset