Time and frequency series

Time and frequency series are fundamental objects in most simulations. kuibit supports them through the timeseries and frequencyseries modules, which define the TimeSeries (Reference on kuibit.timeseries) and FrequencySeries (Reference on kuibit.frequencyseries) object types. Both the classes are derived from a base class BaseSeries (Reference on kuibit.series) which defines common methods and features, so TimeSeries and FrequencySeries share a lot of functionalities.

The TimeSeries and FrequencySeries objects

Here we describe some common feature for TimeSeries and FrequencySeries, and we leave for later sections to go into detail about either. For clarity, we specifically talk about TimeSeries, but everything we discuss here applies to FrequencySeries as well.

Defining time series is easy:

import kuibit.timeseries as ts
# or import kuibit.frequencyseries as fs
import numpy as np

t = np.linspace(0, 2 * np.pi, 100)
sin_wave = ts.TimeSeries(t, np.sin(t))

The object sin_wave has a number of useful features.

You can add, multiply, and divide two TimeSeries (FrequencySeries) with the same times (frequencies) and scalars. For instance

cos_wave = ts.TimeSeries(t, np.cos(t))
tan_wave = sin_wave / cos_wave

The operations currently implemented are sum, subtraction, multiplication, division, power, absolute value, negative.

TimeSeries and FrequencySeries objects are extremely powerful: they try to behave like numpy arrays. For example, many of the mathematical functions from numpy can be applied directly:

ones = np.arctan(tan_wave)
complicated_expr = np.tanh(sin_wave ** 2 / np.abs(cos_wave))

Both the Series are callable objects, so they behave like normal functions. If t0 is any arbitrary time (or frequency), you can call sin_wave(t0) to get the value of sin_wave at t0. To do this, BaseSeries uses spline representation. For more information, read the documentation on splines. Moreover, series are natively plottable by Matplotlib: you can simply do:

import matplotlib.pyplot as plt

plt.plot(cos_wave)

This is equivalent to plt.plot(cos_wave.t, cos_wave.y).

TimeSeries and FrequencySeries objects have multiple useful functions. We follow this convention: methods with an name that is an imperative (e.g., zero_pad) modify the object does not return anything, methods with name that is a past-tense verb (e.g., zero_padded) return a new object with the modification applied.

Series support masking data. Read the page on masks to learn more (Masking invalid data).

Note

Every mathematical operation between different series will perform a check that the two series are compatible (ie, they have the same t or f). This can be expensive, so if you are writing performance-critical routines, you should handle the data directly. See inner_product() for an example.

Note

For performance-critical routines where you have to initialize many series and you can guarantee that the x is monotonically increasing you can pass an additional paramter to the init to speed up the initalization. See reference.

splines

One of the most powerful features of TimeSeries (FrequencySeries) is that they are callable objects and they can evaluate the data at any arbitrary time. This is done using splines. When you first call a TimeSeries (FrequencySeries) a cubic spline representation with no smoothing (the spline evaluates exactly to the data) is computed. This is cached into the attributes self.spline_real (and self.spline_imag if the data is complex).

Every time you modify the series (e.g., integrate), the spline is updated.

This representation allows you to call the Series directly, but if you do it outside the range of definition, you will get a ValueError. You can change the behavior of how to treat external data by calling directly evaluate_with_spline. This takes a keyword argument ext. Values outside the interval are extrapolated if ext=0, set to 0 if ext=1, a ValueError is raised if ext=2, or if ext=3, the boundary value is returned.

Warning

Splines are good continuous representation of data, but they are not perfect, and they are especially unfit for discontinuous data. Be sure to understand the limitations, and use splines only when you know that the representation is good.

Warning

Splines are not supported for masked data. If you are working with masked data, you should first removed the invalid values with mask_remove().

integrate

Integrate the TimeSeries (FrequencySeries) as a cumulative sum weighted on the time intervals (trapezoid). The result is a new Series with the integral as a function of time. Optinally, one can provide dx, which is the spacing in the independent variable. If provided, it will be used. This is especially convenient for evely spaced series, as computations will be faster

Warning

Integration is not supported for masked data. If you are working with masked data, you should first removed the invalid values with mask_remove().

differentiate, spline_differentiated

The method differentiate derives Series(order) with a centered difference method in the interior and a one-sided difference on the boundary. The operation is applied order times to obtain a high-order derivative. On the other hand, spline_differentiate uses the spline representation to achieve the same task. spline_differentiate is typically and better behaved for nice enough timeseries. You should not trust the values at the boundaries too much, you may want to crop it out.

Warning

Differentiation is not supported for masked data. If you are working with masked data, you should first removed the invalid values with mask_remove().

save

Save the Series as an ASCII file with 2 columns \((t, y)\) for real valued data and 3 columns \((t, \Re (y), \Im (y))\) for complex-valued ones. The back-end is np.savetxt, so you can provide additional arguments, like an header.

Warning

Information about masks is lost.

crop

You can specify the initial and final values of the series with the crop method, which takes two keyword paramters init and end

savgol_smooth

savgol_smooth(window_size, order) smooths the series with a Savitzky-Golay filter with window of size window_size and order order. This is just like a regular “Moving average” filter, but instead of just calculating the average, a polynomial (usually 2nd or 4th order) fit is made for every point, and only the “middle” point is chosen. Since 2nd (or 4th) order information is concerned at every point, the bias introduced in “moving average” approach at local maxima or minima, is circumvented. At the moment, this is the preferred way to smooth series.

Warning

Smoothing is not supported for masked data. If you are working with masked data, you should first removed the invalid values with mask_remove().

iter

Series are iterable, so you can do

for t, y in timeseries:
    print(t, y)

min, max, abs_min, abs_max

These methods return the minimum or maximum of the series. With a prefix abs, they return the minimum or maximum of the absolute value of the series.

local_maxima, local_minima

These methods locate the local extrema in the signal. If the data is complex, the absolute value is taken first. The return value of these methods are two tuples, the first with the x coordinates, and the second with the y ones for the peaks. Often, numerical data is noisy, so it is difficult to find the peaks cleanly. Internally, the methods use SciPy’s find_peaks find_peaks. All the options that can be passed to find_peaks can also be passed to local_maxima() and local_minima() (most of which can be used to set filter to find the peaks one is looking for). The optional argument include_edges controls whether the edges should be considered extrema or not.

sample_common

sample_common has two possible uses: (1) takes a list of Series and remove all the points that are not shared by all the Series in the list, or (2) resamples all of them to the largest time interval covered by all series, using regularly spaced time. In this second case, the number of sample points is the minimum over all time series. Optinally, it takes a parameter piecewise_constant. If this is turned True, instead of using splines the resampling is done using the nearest neighbors. This is useful when data is discontinuous, so splines do not behave well.

To choose between the two different behaviors, pass the resample keyword. Number (1) is for example useful to study convergence.

The TimeSeries methods

mean_remove, nans_remove

mean_remove, as the name suggests removes the mean value from the TimeSeries. Similarly, nans_remove filters out those data points with infinitive or NaN values. The resulting TimeSeries has different number of points.

time_unit_change, redshift

time_unit_change(T, inverse=False) rescales the time so that what was previously T units of time now are 1. For example, if initially the units where seconds, with T=1e-3 the new units will be milliseconds. The keyword argument inverse changes the direction: when inverse=True, 1 unit of old time becomes T units in the new time. This is useful to move from computational units to physical units using the unitconv module.

The method redshift(z) uses time_unit_change to redshift the data by a factor of \(1+z\).

import kuibit.unitconv as uc

# Gravitational waves in geometrized units
gw_cu = TimeSeries(...)

# Gravitational waves in seconds, assuming a mass of 1 M_sun
CU = uc.geom_umass_msun(1)
gw_s = gw_cu.time_unit_changed(CU.time, inverse=True)

resample, regular_resample, fixed_frequency_resample, fixed_timestep_resample

resample is a generic method to use splines to resample the TimeSeries to new times. Typical use-cases of resample have their of methods: regular_resample resamples to linearly space times, fixed_frequency_resample and fixed_frequency_resample resample the timeseries with a provided timestep or frequency starting at tmin and ending at a tmax that is an integer multiple of the timestep (or reciprocal of the frequency).

Before using these methods, read the warning in make_spline!

Then using resample, you can optionally pass the keyword piecewise_constant. In this case, splines will not be used, and the new points will be evaluated using the nearest neighbor. This is useful for those cases in which splines are inaccurate.

Fourier transform (to_FrequencySeries)

You can compute the discrete Fourier transform of a TimeSeries with the to_FrequencySeries method. This uses NumPy’s fft module, so the conventions are the same, except that we normalize the results. That is, instead of computing

\[A_k = \sum_m^{n-1} a_m \exp(-2\pi i \frac{mk}{n})\]

we compute

\[A_k = dt \sum_m^{n-1} a_m \exp(-2\pi i \frac{mk}{n})\]

Intuitively, this amounts to adding the measure of integration to obtain a “true” Fourier transform.

If the timeseries real, negative frequencies are discarded.

Note

You are responsible of pre-processing the data (removing mean, windowing, etc.)

unfolded_phase, phase_angular_velocity, phase_frequency

unfolded_phase returns a new TimeSeries with the (complex) unfolded phase of the signal. If the signal is real, the unfolded phase is zero. phase_angular_velocity returns the derivative of the unfolded_phase. The derivative can be compute with finite difference by setting use_splines=False, otherwise it is computed with the splines. Optionally, the output can be smoothed over timescales of tsmooth with the savgol_smooth_time method. In this case, the TimeSeries is resampled to regular timesteps. phase_frequency is just phase_angular_velocity divided by \(2\pi\), which is the angular frequency of the phase.

savgol_smooth_time

Often, one knows the smoothing length in units of time as opposed to number of points (e.g., I want to smooth over timescales of one second). savgol_smooth_time takes smoothing timescale as opposed to the window size. To ensure consistency, savgol_smooth_time resamples the timescale to uniform timesteps. When you have a regularly sampled timeseries, this function is more direct than savgol_smooth. However, when the sampling is very irregular in time, the smoothing length changes throughout the timeseries (which is probably something you do not want).

windowed, tukey_windowed, hamming_window, blackman_window

window(window_function) applies window_function to the timeseries. window_function has to be a function that takes as first argument the number of points of the signal. window_function can take additional arguments as passed by windowed. Alternatively, window_function can be a string that idenfity one of the window functions that are already available (tukey, hamming, blackman).

You can apply directly one of those windows with the methods tukey_window, hamming_window, blackman_window.

zero_pad

zero_pad(N) pads the Timeseries with zeros so that it has a total of N points. If N is smaller than the number of points in the Timeseries, or if the Timeseries is not equispaced in time, the operation will fail.

initial_time_remove, final_time_remove

With these methods you can remove a portion of the signal at the beginning or at the end of the timeseries. This is different from cropping, because tmin may not be 0 (when you crop, you specify what is the new tmin). Here, you specify the amount that you want to remove.

The FrequencySeries methods

normalize

Normalize the FrequencySeries so that it maximum amplitude is one.

low_pass, high_pass, band_pass

low_pass, high_pass, and band_pass apply standard filters to remove some frequencies. In case the signal is complex, both positive and negative frequencies are removed (e.g., high_pass(fmin) removes frequencies f so that abs(f) <= f).

peaks, peaks_frequencies

peaks(amp_threshold) detects the peaks (local maxima) in the amplitude of the spectrum that are larger than amp_threshold. This is a specialized version of local_maxima(). It returns a list of tuples. The first element of the tuple is the frequency bin in which the maximum is found, the second is a estimate obtained using a quadratic fit, and the third is the actual value of the amplitude. peaks_frequencies(amp_threshold) is like peaks(amp_threshold) but returns only the fitted frequencies.

Often, it is better to normalize the series, so that amp_threshold becomes a percentual value of the the maximum peak.

Inverse Fourier transform (to_TimeSeries)

Using NumPy’s fft, return a TimeSeries that is the inverse Fourier transform. It is that to_TimeSeries() composed with to_FrequencySeries() is the identity with the exception of the domain of definition. The time domain is from \(-{(2 \Delta f)}^{-1}\) to \({(2 \Delta f)}^{-1}\).

If only positive frequencies are found, we will assume that the original signal was real.

Occasionally signals that are supposed to be real are turned into complex with imaginary part that is zero to machine precision.

This uses NumPy’s fft module, so the conventions are the same, except that we normalize the results. That is, instead of computing

\[a_m = \frac{1}{n} \sum_k^{n-1} A_k \exp(2\pi i \frac{mk}{n})\]

we compute

\[a_m = \frac{df}{n} \sum_k^{n-1} A_k \exp(2\pi i \frac{mk}{n})\]

Intuitively, this amounts to adding the measure of integration to obtain a “true” Fourier transform.

inner_product, and overlap

Given \(h1, h2\) frequency series and \(S_n\) spectral noise density, the inner product is typically defined as

\[(h_1, h_2) = 4 \Re \int_{f_min}^{f_max} \frac{h_1 h_2^*}{S_n}\]

The method inner_product() computes this quantity, possibly for a network of detectors. If the noise is not provided, S_n will be fixed to one. Alternatively, if the noise is a FrequencySeries, the inner product for that weighted with that noise will be computed. Alternatively, if noises is a list of FrequencySeries, then we will assume that the user wants to compute the network inner product:

\[(h_1, h_2)_{\textrm{network}} = \sum_{\mathrm{detectors}} (h_1, h_2)\]

where each detector has its own noise curve. Internally, h_1, h_2, and S_n will be resampled to a common frequency interval with the number of points of the series with fewest points. Hence, the accuracy of the computation is determined by the accuracy of the series with fewest points.

The series are assumed to be zero outside the range of definition. So, if f_min or f_max are too large or too small, the effective parameter will be determined by the series. By default, f_min=0 and f_max=inf.

Warning

Results with the defaults limits are very unstable (for example, Fourier transform typically diverge around zero, so the result of the integration is not accurate). Hence, one should always use physical limits.

With the inner product, one compute the overlap between two series:

\[\textrm{overlap} = (h_1, h_2) / \sqrt{(h_1, h_1)(h_2, h_2)}\]

Again, this can be unweighted, or noise-weighted, or for a network of detectors (if a list of noises is provided).

If you can guarantee that all the series have the same domain (including the noise), then you can set same_domain to True to speed up computations.

load_FrequencySeries

This function can be used to load a file as a FrequencySeries. This is particularly useful for noise curves. Internally, this function uses Numpy’s loadtxt so, additional arguments can be passed directly to that method.

For noise curves, you can use load_noise_curve() with the path of the file. (This internally uses load_FrequencySeries()).

Additional functions in timeseries

timeseries has also some additional useful functions, described here.

combine_ts

combine_ts takes a list of TimeSeries as input and combine them in a single new TimeSeries with monotonically increasing time. combine_ts can be called with prefer_late=True (default) or not. The difference between the two is that when prefer_late=False data from the TimeSeries with smaller tmin (i.e., the previous checkpoint) is preferred, and the opposite is true for prefer_late=True (i.e., the later checkpoint is used).

time_at_maximum, time_at_minimum

Often it is useful to know where is the peak of a signal (for example, for gravitational waves). These methods return the time at which the absolute value of the signal is maximum and minimum respectively.

time_shift, phase_shift

These methods apply common “shift” operations to the data. With time_shift, you can add a constant offset to the times of the series, whereas with phase_shift you can apply an offest in the complex phase of the form: \(\exp(i \phi)\). When you apply a phase shift, if the signal is real it will be turned into complex.

Common operations like time-shifting a series so that the absolute maximum (or minimum) is at t=0 have specialized methods (for convenience): align_at_maximum and align_at_minimum.

remove_duplicate_iters

This function takes two arrays t and y and remove overlapping segments of time (such as, from checkpointing) returning a TimeSeries with monotonically increasing times.

unfold_phase

In gravitational-wave astronomy the phase of a wave is typically unfolded so that instead of going from \(0\) to \(2\pi\), it is free to assume any value so that the number of periodicities can be counted. unfold_phase takes a signal and removes all the jumps of \(2\pi\). Optionally, provide a time t_of_zero_phase, the value of the phase is offset so that it is zero when the time is t_of_zero_phase.

BaseNumerical object

The BaseSeries class is derived from a even more abstract one, BaseNumerical. This class represent anything for which it makes sense to do calculations with. BaseNumerical implements all the infrastrcture needed to overload the mathematical operations. To do this, derived class must define four functions: - _apply_unary, that describes the output of applying a function to self (e.g., sin(self)). - _apply_binary, that describes the output of applying a function to self and other (e.g., self + other). - _apply_reduction, that describes the output of applying a function to self that returns a float (e.g., min(self)). - _apply_to_self, that modifies self upon application of a function that returns a compatible object.

This infrastrcture is also used by grid functions in kuibit.