Data cubes

A data cube contains the flux distribution measured in an observation as a function of position on the sky (RA and Dec) and frequency (or velocity). The main purpose of MARTINI is to take a source as input and determine how its flux is distributed in an output data cube.

Data cubes in MARTINI

MARTINI uses the DataCube class to represent the data cube that will be output as the end product of creating a mock observation. This defines the extent of the region to be observed, both in position (angular apertures around a central RA and Dec) and in frequency or velocity (bandwidth around a central frequency, or velocity offset around a central velocity). The resolution is also defined by the number of pixels and channels that subdivide the region to be observed.

Using MARTINI’s DataCube class

When using MARTINI, most of the time the only interaction with the DataCube class is to initialize it and pass it to the main Martini class. This section therefore focuses on the parameters that can be set at initialization, but at the bottom there is a short mention of writing the state of a data cube to disk (and restoring from this).

Initialization parameters

The footprint of the mock observation on the sky is set by the five parameters: ra, dec, px_size, n_px_x, n_px_y. It is recommended to choose even numbers of pixels along each axis for n_px_x (RA axis) and n_px_y (Dec axis). The ra and dec parameters then control the location of the boundary between the two centremost pixels along each axis, the the boundaries of pixels going out from this centre are each offset in angle by px_size from their neighbour. For users accustomed to the FITS standard, the px_size corresponds to CDELT and ra and dec correspond to CRVAL, with CRPIX set to the boundary between the two central pixels (or the central pixel if an odd number is chosen). There is also an option to initialize the data cube from a FITS header (see below). The ra, dec and px_size parameters should be provided with astropy.units with dimensions of angle.

In the spectral direction, the number of channels is set by n_channels and the location of the boundary between the two central channels (for an even number of channels) or the centre of the central channel (for an odd number of channels) is set by spectral_centre (similar to CRVAL). The spectral_centre can be specified either as a velocity (astropy.units with dimensions of speed) or as a frequency (dimensions of inverse time). Similarly, the spacing between channels, set by channel_width (similar to CDELT), can be either a frequency spacing or a velocity spacing. Frequencies and velocities can be mixed freely, so a spectral_centre in Hz could be given along with a channel_width in km/s. The dimensions of channel_width are important: if these are speed, then the channels will be evenly spaced in velocity, while if they are frequency the channels will be evenly spaced in frequency.

If a Stokes’ axis is desired, this can be enabled with stokes_axis=True.

Note

The size of the array stored by a DataCube typically changes during the process of using MARTINI because some padding is applied to ensure accuracy when convolving with a beam, but will return to its original size after convolution or before writing out a mock observation. See the core routines section for an explanation.

A possible initialization of a DataCube looks like:

import astropy.units as U
from martini import DataCube
from astropy.coordinates import ICRS

datacube = DataCube(
    n_px_x=256,
    n_px_y=256,
    n_channels=64,
    px_size=15 * U.arcsec,
    channel_width=4 * U.km / U.s,
    spectral_centre=1000 * U.km / U.s,
    ra=45 * U.deg,
    dec=-30 * U.deg,
    stokes_axis=False,
)

It often makes sense to place the source centre (defined by its RA, Dec and systemic velocity) in the centre of the data cube. A convenient way to do this looks like (omitting the particle data in the source initialization):

import astropy.units as U
from martini.sources import SPHSource
from martini import DataCube

source = SPHSource(
    distance=10 * U.Mpc,
    vpeculiar=-75 * U.km / U.s,
    ra=45 * U.deg,
    dec=-30 * U.deg,
    h=0.7,
    T_g=...,
    mHI_g=...,
    xyz_g=...,
    vxyz_g=...,
    hsm_g=...,
)
# the source provides an attribute called vsys
# defined as h * 100km/s * distance + vpeculiar
datacube = DataCube(
    n_px_x=256,
    n_px_y=256,
    n_channels=64,
    px_size=15 * U.arcsec,
    channel_width=4 * U.km / U.s,
    spectral_centre=source.vsys,
    ra=source.ra,
    dec=source.dec,
    stokes_axis=False,
)

Coordinate frames and standard of rest

By default the DataCube coordinate frame is ICRS that is centred and at rest with respect to the Solar System barycentre. Some use cases of MARTINI might require a different frame. This can be set with a keyword argument as DataCube(..., coordinate_frame=LSRK()), for example - notice that the frame should be initialized (LSRK() not LSRK). Keep in mind that since this frame defines the coordinate system for an observation within a World Coordinate System (WCS), the coordinate frame must be one with a notion of RA and Dec (so no, for example, Galactocentric coordinates). Galactic coordinates are not currently supported.

Note that the source is also defined in the ICRS frame by default. If the coordinate_frame of the DataCube is changed from the default, consider whether the source frame needs to be changed to match. In most cases leaving both in ICRS is all that’s needed. You could link the two programatically like this (schematically), if desired:

from martini.sources import SPHSource
from martini import DataCube
from astropy.coordinates import LSRK

source = SPHSource(..., coordinate_frame=LSRK())
datacube = DataCube(
    ...,
        coordinate_frame=source.coordinate_frame,
        specsys=source.coordinate_frame.name
)

The “standard of rest” (in FITS headers identified by the 'SPECSYS' card) can also be controlled. When the source is created its velocity is defined with respect to the origin of its coordinate_frame (ICRS, by default), which is taken to be at rest. The velocity is converted to the DataCube coordinate frame and also its standard of rest. By default this is "icrs" (i.e. the origin of the ICRS frame is taken to be at rest). This can be changed with a keyword argument as DataCube(..., specsys="lsrk"), for example. Any standard of rest supported by astropy is allowed, and the coordinate origin can be in motion depending on the choice of standard of rest. Keep in mind that the velocity of the source in its frame will be converted to the specified standard of rest - for ease of control of where your source lands in the channels of your DataCube it’s simplest to define the source in the same standard of rest (and coordinate frame) as that in which you wish to observe it.

Initializing from a FITS header

If you have a precise observational footprint in mind because you want to compare with or even inject sources into an existing data cube, calculating the initialization parameters for DataCube can be quite tedious (especially ra and dec if the target data cube is not defined by its centre). To simplify this workflow DataCube provides a method from_wcs(). This accepts a WCS instance that describes the World Coordinate System of the data cube, and this can be very easily created from a FITS header. For example, given a FITS file my_cube.fits:

from astropy import wcs
from astropy.io import fits
from martini.datacube import DataCube

with fits.open("my_cube.fits") as fitsfile:
    fits_hdr = fitsfile[0].header  # header of the main HDU
fits_wcs = wcs.WCS(fits_hdr)
datacube = DataCube.from_wcs(fits_wcs)

A notebook with a worked example of inserting a simulated galaxy into an observed data cube is provided in the examples directory on github.

While for “normal” initialization of a DataCube the coordinate frame and standard of rest can be set (or default to ICRS and "icrs", respectively), when a DataCube is initialized from a FITS header, MARTINI will try to determine the coordinate frame and standard of rest from the header. This generally works well for the coordinate frame (but for headers that don’t conform to the FITS standard, could fail, raising an exception originating in astropy). The standard of rest is in practice less standardized. MARTINI will look for a specsys in the WCS object and try to interpret this as a standard of rest supported by astropy. If the WCS specifies BARYCENT this will be interpreted as "icrs" (a common barycentric frame of reference), but since this choice is ambiguous a warning will be produced. If the WCS doesn’t specify a specsys a warning will be produced (ignoring this one probably results in a crash later in typical MARTINI use). If the WCS does provide a specsys but MARTINI fails to interpret it an exception is raised. All of this can be overridden by explicitly setting the standard of rest with DataCube.from_wcs(..., specsys="icrs"), for example. For a complete list of options you can do:

from astropy.coordinates import frame_transform_graph
frame_transform_graph.get_names()

Properties of the data cube

You can access the World Coordinate System (WCS) as a WCS object with (assuming datacube = DataCube(...):

  • datacube.wcs

The channel edges and centres in their intrinsic units (those in which they are evenly spaced) can be accessed as:

  • datacube.channel_mids

  • datacube.channel_edges

or obtained with specifically frequency or velocity dimensions with:

  • datacube.frequency_channel_mids

  • datacube.frequency_channel_edges

  • datacube.velocity_channel_mids

  • datacube.velocity_channel_edges

Iterators over the data cube slices in frequency (i.e. “channel maps”) and over the spectra in each pixel can be obtained as:

  • datacube.spatial_slices

  • datacube.channel_maps (same as spatial_slices)

  • datacube.spectra

Saving, loading & copying the data cube state

Because some operations that modify DataCube objects are computationally expensive, especially insert_source_in_cube(), some functionality to load/save/copy the state of a datacube object is provided. For instance, the result of inserting the source in the cube could be cached and the source insertion step skipped if the cache file exists like this:

import os
from martini import Martini, DataCube
from martini.sources import SPHSource
from martini.beams import GaussianBeam
from martini.noise import GaussianNoise
from martini.sph_kernels import CubicSplineKernel
from martini.spectral_models import GaussianSpectrum

# initialization parameters omitted for this schematic example:
source = SPHSource(...)
datacube = DataCube(...)
beam = GaussianBeam(...)
noise = GaussianNoise(...)
sph_kernel = CubicSplineKernel(...)
spectral_model = GaussianSpectrum(...)

m = Martini(
    source=source,
    datacube=datacube,
    beam=beam,
    noise=noise,
    sph_kernel=sph_kernel,
    spectral_model=spectral_model,
)
cache_filename = "cache.hdf5"  # note h5py must be installed
if not os.path.isfile(cache_filename):
    m.insert_source_in_cube()  # expensive step
    m.datacube.save_state(
        cache_filename,
        overwrite=False,  # set to True to allow overwriting existing files
    )
else:
    m.datacube.load_state(cache_filename)  # avoid expensive step
m.add_noise()
m.convolve_beam()
m.write_fits("my_mock.fits")

Warning

The save_state() method is not intended to save the result of a mock observation. Use the Martini class’s write_fits() or write_hdf5() methods for this purpose.

Another possible workflow is to copy a DataCube to create two mock observations that share some common initial steps and then differ later:

import astropy.units as U
from martini import Martini, DataCube
from martini.sources import SPHSource
from martini.beams import GaussianBeam
from martini.noise import GaussianNoise
from martini.sph_kernels import CubicSplineKernel
from martini.spectral_models import GaussianSpectrum

# initialization parameters omitted for this schematic example:
source = SPHSource(...)
datacube1 = DataCube(...)
datacube2 = datacube1.copy()  # placeholder, we'll replace it below
beam1 = GaussianBeam(bmaj=30 * U.arcsec, bmin=30 * U.arcsec)
beam2 = GaussianBeam(bmaj=15 * U.arcsec, bmin=15 * U.arcsec)
noise = GaussianNoise(...)
sph_kernel = CubicSplineKernel(...)
spectral_model = GaussianSpectrum(...)

m1 = Martini(
    source=source,
    datacube=datacube1,
    beam=beam1,
    noise=noise,
    sph_kernel=sph_kernel,
    spectral_model=spectral_model,
)
m2 = Martini(
    source=source,
    datacube=datacube2,
    beam=beam2,
    noise=noise,
    sph_kernel=sph_kernel,
    spectral_model=spectral_model,
)
m1.insert_source_in_cube()  # expensive step
m2.datacube = m1.datacube.copy()  # bypass expensive step
m1.add_noise()
m2.add_noise()
m1.convolve_beam()
m2.convolve_beam()
m1.write_fits("my_mock1.fits")
m2.write_fits("my_mock2.fits")

Warning

The example using copy() has a subtle potential pitfall. Because of the padding applied to the datacube when creating the Martini object in preparation for convolution with the beam (see core routines section), the two datacubes have different dimensions once m1 and m2 are initialized. A given beam requires a minimum pad size, so this example has been carefully constructed to copy the datacube associated with m1 (with the larger pad associated with the larger beam) into m2 (that requires a smaller pad because of the smaller beam). Trying to swap which DataCube is copied results in a pad that is too small when the convolve_beam() step is reached and raises an error similar to:

ValueError: datacube padding insufficient for beam convolution (perhaps you loaded
a datacube state with datacube.load_state that was previously initialized by
martini with a smaller beam?)

It is a known issue that this kind of workflow is quite a fragile construct. Some streamlining and simplification is planned for future code development.