Masking invalid data

It is often useful to ignore specific pieces of data. For example, it is wise to exclude the atmosphere when we compute the maximum temperature GRMHD simulations. For this, kuibit inherits from NumPy the concept of masks: masked data carries along the information of where the data is valid and where it is not. In kuibit, classes derived from BaseNumerical (mainly, TimeSeries, FrequencySeries, UniformGridData, HierarchicalGridData) support masks, meaning that operations like max() will not include the data marked as invalid. In this page we describe how to work with masks (series_ref:Reference on kuibit.masks).

Creating masked objects

Since the interface for is the same for all the classes defined in kuibit, we will consider a TimeSeries as an example.

To create a masked object, you first need to start from the clean version. Suppose ts is a TimeSeries, there are multiple ways to return a new object ts_masked:

# Data is invalid when it is equal to 1
ts1 = ts.masked_equal(1)

# Data is invalid when it is larger than 2
ts2 = ts.masked_greater(2)

# Data is invalid when it is larger or equal than 3
ts3 = ts.masked_greater_equal(3)

# Data is invalid when it is between 4 and 5
ts4 = ts.masked_inside(4, 5)

# Data is invalid when it is NaN or inf
ts5 = ts.masked_invalid()

# Data is invalid when it is larger than 6
ts6 = ts.masked_less(6)

# Data is invalid when it is larger or equal than 7
ts7 = ts.masked_less_equal(7)

# Data is invalid when it is not 8
ts8 = ts.masked_not_equal(8)

# Data is invalid when it is outside the range (8,9)
ts9 = ts.masked_outside(8, 9)

All these methods return new objects. Alternatively, it is possible to edit the object in place using methods with the imperative form (e.g., mask_equal instead of masked_equal).

The second way to create masked objects is by using the functions in masks, which contains methods for mathematical functions that are defined on a domain. For instance, if you want to compute the natural logarithm of some data, you can use the function masks.log(), which automatically applies a mask where the operation is not defined.

import kuibit.masks as ma

log_ts = ma.log(ts)

log_ts.is_masked()  # => True

The method is_masked() checks whether the object is masked or not. When objects are masked, some methods become unavailable. For example, it is not possible to compute splines or perform interpolations. For TimeSeries and FrequencySeries, you can go around this limitation by removing the invalid points with the methods mask_remove() or mask_removed(). This is not possible with grid data because we assume that the data is defined on regular grids.

Suppose you want to mask the atmosphere. You have your density variable rho and you want to remove everything that is below 1e-8, and you want to plot the pressure press. For that you would first construct a masked grid data with rho, and then apply the mask to press:

masked_rho = rho.masked_less(1e-8)
masked_press = press.mask_applied(masked_rho.mask)

If you want to plot this with visualize_matplotlib, you need to pay attention that resampling erases mask information. Therefore, if you want to plot the mask, you have to pass directly the UniformGridData you want to plot (and not HierarchicalGridData).

Warning

We only mask the data, not the independent variable (e.g., the time in TimeSeries). If your computations required this variable to be masked too, you should extract the mask array with the mask() method and manually apply the mask.

Warning

Some methods will not work with masked data (e.g. splines and interpolation). Therefore, resampling operations will not carry over the masks. You have to apply the masks again. One instance in which this is important is plotting with visualize_matplotlib. Also, the save() method will discard the mask information.