Masking invalid data¶
It is often useful to ignore specific pieces of data. For example, it is wise to
exclude the atmosphere when we compute the maximum temperature GRMHD
simulations. For this,
kuibit inherits from NumPy the concept of masks:
masked data carries along the information of where the data is valid and where
it is not. In
kuibit, classes derived from
masks, meaning that operations like
max() will not include the data
marked as invalid. In this page we describe how to work with masks
(series_ref:Reference on kuibit.masks).
Creating masked objects¶
Since the interface for is the same for all the classes defined in kuibit, we
will consider a
TimeSeries as an example.
To create a masked object, you first need to start from the clean version.
ts is a
TimeSeries, there are multiple ways to return
a new object
# Data is invalid when it is equal to 1 ts1 = ts.masked_equal(1) # Data is invalid when it is larger than 2 ts2 = ts.masked_greater(2) # Data is invalid when it is larger or equal than 3 ts3 = ts.masked_greater_equal(3) # Data is invalid when it is between 4 and 5 ts4 = ts.masked_inside(4, 5) # Data is invalid when it is NaN or inf ts5 = ts.masked_invalid() # Data is invalid when it is larger than 6 ts6 = ts.masked_less(6) # Data is invalid when it is larger or equal than 7 ts7 = ts.masked_less_equal(7) # Data is invalid when it is not 8 ts8 = ts.masked_not_equal(8) # Data is invalid when it is outside the range (8,9) ts9 = ts.masked_outside(8, 9)
All these methods return new objects. Alternatively, it is possible to edit the
object in place using methods with the imperative form (e.g.,
The second way to create masked objects is by using the functions in
masks, which contains methods for mathematical functions that are
defined on a domain. For instance, if you want to compute the natural logarithm
of some data, you can use the function
masks.log(), which automatically
applies a mask where the operation is not defined.
import kuibit.masks as ma log_ts = ma.log(ts) log_ts.is_masked() # => True
is_masked() checks whether the object is masked or not.
When objects are masked, some methods become unavailable. For example, it is not
possible to compute splines or perform interpolations. For
FrequencySeries, you can go around
this limitation by removing the invalid points with the methods
mask_removed(). This is not possible with
grid data because we assume that the data is defined on regular grids.
Suppose you want to mask the atmosphere. You have your density variable
and you want to remove everything that is below
1e-8, and you want to plot
press. For that you would first construct a masked grid data
rho, and then apply the mask to
masked_rho = rho.masked_less(1e-8) masked_press = press.mask_applied(masked_rho.mask)
If you want to plot this with
visualize_matplotlib, you need to pay
attention that resampling erases mask information. Therefore, if you want to
plot the mask, you have to pass directly the
want to plot (and not
We only mask the data, not the independent variable (e.g., the time in
TimeSeries). If your computations required this variable to be
masked too, you should extract the mask array with the
method and manually apply the mask.
Some methods will not work with masked data (e.g. splines and interpolation).
Therefore, resampling operations will not carry over the masks. You have to
apply the masks again. One instance in which this is important is plotting
visualize_matplotlib. Also, the
will discard the mask information.