Tohban OVRO-LWA Imaging Tutorial

From EOVSA Wiki
Jump to navigation Jump to search

Initial Setup

The OVRO-LWA has three solar modes that can operate concurrently. These are (1) the beamformer, which creates a high-resolution spectrogram of the solar activity each day, (2) a slow visibility mode that records data in CASA ms format for all 352 antennas and all 3072 frequencies at 10-s cadence, and (3) a fast visibility mode that records data for a 48-antenna subset (generally the outer antennas) and 768 frequencies at 1-s cadence. The recorders that record the data are all activated separately, so it is not guaranteed that data from all three modes are available at any one time. Also, because of the vast data volume most of the recorded data are not saved, but rather are overwritten after a day or so, hence any data that are wanted must be explicitly saved by copying it to another location. Again because of the large volume of data, such copying is too slow to save much data (at least at present), so we can generally save only about an hour of data per day.

Note: This tutorial only describes how to work with the slow visibility data at the moment.

Python Environment

The imaging pipeline is written in Python 3, so in order to use it one must set up a Python 3 environment. These instructions assume you are working in your own home directory on the Pipeline machine at OVRO. First enter the bash shell if you are not already in it. Type echo $0 to see what shell you are in. If that returns something other than -bash, type bash to enter the shell. Next check if you have the line alias loadpyenv3.8='source /home/user/.setenv_pyenv38' in your ~/.bash_aliases file. If not, add it using your favorite editor, then activate it with source ~/.bash_aliases. From there, you can type loadpyenv3.8 to enter the Python 3.8 environment. Finally, from your home folder, type git clone https://github.com/binchensun/ovro-lwa-solar to install the OVRO-LWA code. To test your Python environment, log out and log in again fresh, then type

$> loadpyenv3.8
$> ipython --pylab
import sys
sys.path.append('/home/dgary/ovro-lwa-solar')  # Replace with your own home directory
from ovrolwasolar import solar_pipeline

If that succeeds, you should be ready to proceed.

Where to Find Data

The next step is to find the data you want to work with. You will need some calibration data as well as the solar data for your target date. As of this writing, the existing solar data on Pipeline, is in two separate places: /nas5/ovro-lwa-data (data up to 2023-09-03) and /nas6/ovro-lwa-data (data from 2023-09-18 and later). All of the existing beamformed data (spectrograms) are in /nas5/ovro-lwa-data/beam/beam-data.

This tutorial uses the example of the type II burst on 2023-07-28.

Examining the Spectrogram for Your Date

It is good practice to examine the spectrogram for your date/time, to guide your selection of frequencies and times to use for imaging. You can check the folders and subfolders in /nas5/ovro-lwa-data/beam/beam-data to see what files exist. Note that the filenames have the Modified Julian Data (mjd) followed by hours, minutes, seconds in the format <mjdday>.<hh><mm><ss>?????????? where the ? indicate more digits of the fraction of a second. The type II burst we are interested in started around 15:43 UT on 2023 July 28, which is MJD 060154, so the file we want is /nas5/ovro-lwa-data/beam/beam-data/202307/beam20230728/060153_152717110834334d2be, which starts at 15:27:17 UT. Generally these files contain 30 min of data. The type II continues into the next file, which is 060153_1558172229518804396.

To read and display this file, in iPython type

2023 July 28 Type II event spectrogram
import sys     # If not already loaded
sys.path.append('/nas5/ovro-lwa-data/beam/software/')
from lwa import lwa_read, lwa_plot
datadir = '/nas5/ovro-lwa-data/beam/beam-data/202307/beam20230728/'
data = lwa_read(datadir+'060153_152717110834334d2be', stokes='IV', timebin=1, freqbin=4)
lwa_plot(data, vmax=15000,vmin=10)

which defaults to log-scaled amplitudes and viridis color table for stokes I and linear-scaled amplitudes and grayscale for stokes V, as shown at left. You can examine lwa_plot? for more options.


Calibration and Imaging Script

The script below assumes some previous setup. First, a "home" directory needs to be created and the script must be run from that directory. Because of the large amount of disk space required, create your "home" directory on /data1. Mine is /data1/dgary/OVRO-LWA/20230728_workdir. Before running the script, you'll need to change the 7 lines indicated with the ***Change comments.

  1. The first such line is the list of frequency bands you want to image. In this case I have all 13 useful bands. Frequencies below 27 MHz rarely image well and in many cases we did not save the data for those frequencies anyway.
  2. The second is a string representing the date of the event, including an underscore (this is part of a filename).
  3. The third line is a list of solar times. These times have to exactly match existing filenames, so you'll have to do a listing of the data directory to check them. Warning: Doing a listing of the entire data directory is time consuming and not useful, since there are many thousands of files there. Instead, use something like: ls /nas5/ovro-lwa-data/20230728/slow/20230728_1553* to limit the number of files returned.
  4. The fourth line is the date string of the calibration data. This will almost always be the same as the date string of the data, but it is possible to use a calibration from a different date if not too far apart.
  5. The fifth line is the time of the calibration data. Again, this must exist. Usually the calibration is done at night so the time will be quite different, e.g. 0500 UT, and a command like ls /nas5/ovro-lwa-data/20230728/slow | head -20 will list the first 20 files in the folder, which are likely the calibration files. Unfortunately, no nighttime calibration exists for this date, so I had to use a daytime time, 15:40 UT.
  6. The sixth line is the path to the data.
  7. The seventh line is the path to the calibration data, again usually the same as that for the data.
import os, glob
from ovrolwasolar import solar_pipeline, utils
from time import time

freqs=[27,32,36,41,46,50,55,59,64,69,73,78,82]    # ***Change to the bands you want to image
datstr = '20230728_'                              # ***Change to the date of your event
solar_times = ['155306','155316','155326']        # ***Change to the times to use for solar imaging -- these times must exist!
caldatstr = '20230728_'                           # ***Change to the date of your cal data
cal_time = '154003'                               # ***Change to the time for your calibration
datapath = '/nas5/ovro-lwa-data/20230728/slow/'   # ***Change to path to your data
calpath = '/nas5/ovro-lwa-data/20230728/slow/'    # ***Change to path to your calibration data

home=os.getcwd()
for solar_time in solar_times:
    for freq in freqs:
        calib_ms=caldatstr+cal_time+'_'+str(freq)+"MHz.ms"     # Will be copied from calpath
        solar_ms=datstr+solar_time+'_'+str(freq)+"MHz.ms"      # Will be copied from datapath
        bcal='caltables/'+calib_ms.replace('ms','bcal')        # Will be created if it doesn't already exist
        imagename=datstr+solar_time+'_'+str(freq)+"MHz"
        image_fold = 'images/'

        # Create frequency folder, if it doesn't exist
        freq_fold=str(freq)+"MHz"
        if not os.path.isdir(freq_fold):
            os.mkdir(freq_fold)

        # Copy the solar data for this time (will be deleted later)
        print('Copying solar data to frequency folder')
        os.system("cp -r "+os.path.join(datapath,solar_ms)+"* "+freq_fold+"/")
        # Copy the calibration data (will be deleted later)
        print('Copying calibration data to frequency folder')
        os.system("cp -r "+os.path.join(calpath,calib_ms)+"* "+freq_fold+"/")

        os.chdir(freq_fold)
        if not os.path.isdir(image_fold):
            os.mkdir(image_fold)
        if not os.path.isfile(bcal):
            bcal = None
        if not os.path.isdir('caltables'):
            os.mkdir('caltables')
        if not os.path.isdir('final_ms'):
            os.mkdir('final_ms')

        try:
            solar_pipeline.image_ms(solar_ms=solar_ms,calib_ms=calib_ms,bcal=bcal,\
                        imagename=imagename,do_final_imaging=False,logfile='analysis_'+str(freq)+'.log')
            msname = datstr+solar_time+'_'+str(freq)+'MHz_final.ms'
            os.system("mv *calibrated_selfcalibrated_sun_only_sun_selfcalibrated_sun_only.ms final_ms/"+msname)
            os.system("rm -rf *.ms* *.fits *.gcal *.cl *.badants")
            # Make 10 images for this band (integrates over 19 or 20 subchannels, bandwidth ~0.4545 MHz)
            #os.system('wsclean -no-dirty -size 1024 1024 -scale 1arcmin -weight uniform -minuv-l 10 -name '+imagename+' -niter 10000 -mgain 0.8 -beam-fitting-size 1 -pol I -join-channels -channels-out 10 final_ms/'+msname)
            # Convert images to heliocentric, move them to the final image folder, and delete all fits files
            #files = glob.glob('*-image.fits')
            #for imgfile in files:
            #    #utils.correct_primary_beam('final_ms/'+msname, imgfile.split('-image.fits')[0])
            #    helio_image = utils.convert_to_heliocentric_coords('final_ms/'+msname, imgfile)
            #    os.system('mv '+helio_image+' '+image_fold)
            #os.system('rm *.fits')
        except:
            pass
        os.chdir(home)

After the final ms's are created, run the code below to create fits-wrapped image cubes of the data.

from solar_realtime_pipeline import run_imager
from suncasa.utils import helioimage2fits as hf
from suncasa.io import ndfits
from astropy.time import Time
import glob
import os

freqs = [27,32,36,41,46,50,55,59,64,69,73,78,82] # ***Change to the bands you want to image
solar_times = ['192030']                         # ***Change to the times to use for solar imaging -- these times must exist!
datstr = '20231009_'                             # ***Change to the date of your event
for solar_time in solar_times:
    tref = Time(datstr[:4]+'-'+datstr[4:6]+'-'+datstr[6:]+' '+solar_time[:2]+':'+solar_time[2:4]+':'+solar_time[4:])
    ephem = hf.read_horizons(tref, dur=1./60./24., observatory='OVRO_MMA')
    if not os.path.exists('imagedir_allch'):
        os.makedirs('imagedir_allch')
    if not os.path.exists('fits'):
        os.makedirs('fits')
    outfits_helio = []
    # Make all the images (by calling run_imager)
    for freq in freqs:
        folder = str(freq)+'MHz'
        msname = folder+'/final_ms/'+datstr+solar_time+'_'+str(freq)+'MHz_final.ms'
        outfits_helio += run_imager(msname, imagedir_allch='imagedir_allch/', ephem=ephem)

    fitsfiles_mfs = []
    fitsfiles_fch = []
    for f in outfits_helio:
        if 'MFS' in f:
            fitsfiles_mfs.append(f)
        else:
            fitsfiles_fch.append(f)
    ## Wrap images
    timestr_iso = tref.isot[:-4].replace(':','')+'Z'
    fits_mfs = 'fits/ovro-lwa.lev1_mfs_10s.' + timestr_iso + '.image.fits'
    fits_fch = 'fits/ovro-lwa.lev1_fch_10s.' + timestr_iso + '.image.fits'
    # multi-frequency synthesis images
    fitsfiles_mfs.sort()
    ndfits.wrap(fitsfiles_mfs, outfitsfile=fits_mfs)

    # fine channel spectral images
    fitsfiles_fch.sort()
    ndfits.wrap(fitsfiles_fch, outfitsfile=fits_fch)

What Happens When You Run the Script

One way to run these scripts is to cut-and-paste the first script into a file, say process.py, cut-and-paste the second script into another file, say mk_imgs.py, and then in an iPython session type

import sys
sys.path.append('/home/dgary/ovro-lwa-solar')        # Change to your path where you cloned the git repository
run 'process.py'      # Wait until all final ms's are created--could take many hours!
run 'mk_imgs.py'

If all goes well, after many hours you will have all of your images. If you examine the script, you will see that there are two loops, an inner one over frequency and an outer one over time. The inner loop will create a subdirectory for the frequency it is working on (first will be subdirectory named 27MHz), then do the calibration for that frequency and create a subfolder caltables with a .bcal file in it. Luckily, this only has to be done once and then the .bcal file will be used for subsequent times so its creation will be skipped. Other files with .gcal extension will be created for the first data time, and also will be reused for subesquent times up to one hour later. When a new .gcal file is needed, the pipeline will create it automatically for you. The gain files take about 10 min for each frequency, but again is only done once for an hour of data. After the calibration is complete, wsclean is used to create images (in 10 subbands of each 4.5 GHz band, plus an MFS image integrated over the whole band). They are converted to heliographic coordinates and you will find them in 27MHz/images when done. This takes another 10 minutes or so.

When all of that is done for the first frequency, the whole process starts again for the next, and so one until all images for the first time are done. In this example, then, it will take about 20/min per frequency * 13 frequencies = 260 minutes (> 4 hours!) to make all 143 images for the first time (10 images per band + 1 MFS image). For subsequent times, though, the calibration step is skipped so each subsequent time will take 10 min * 13 frequencies (around 2 hours). That means the entire script will run in about 8 hours and produce 429 images.