Data Reduction Pipeline


Special notes:

dithering

older notes


Pipeline status:


Pipeline 1.1 released. It is an (almost) turnkey program which, starting with a set of raw data, produces a co-added left/right pair for the target chosen. These images have been "undistorted" and registered (x,y shift) before being coadded. The input files for CALCTRANS (see below) are automatically prepared so that once object pairs are found their calibrated velocities can be obtained.

It is "almost" turnkey because before running:
you need to centroid a few spots in the reference masks you want to use;
you need to correct sky flats which have been classified as science integrations owing to the way the data was taken;
you need to type in the names of the masks and associated data sets.
In addition, if no flats are included in the data set, you need to give the name of pixel-to-pixel correction masks, and bad pixel masks, prepared elsewhere.

Typically, you should first run the program one command at a time, cutting from the IRAF script and pasting the commands to the CL> command line. In the process, missing file names can be pasted into the script and one can get a feel for what is happening. The script can subsequently be called as a task, performing the reduction in one step.

Running the pipeline


29/10/2002 Pipeline 1.2 released.
Changes:
The undistort routine now uses a 31x31 sinc function as the interpolation function - this greatly reduces the patterning in the undistorted images. Unresolved objects such as cosmic ray events result in severe ringing and these must be removed first. The change was a suggestion by Koen.


24sep03update.tar (uploaded by AJR)


11/12/2003 Pipeline 1.3 archived 11/12/2003.
Changes:
Some important phot parameters now set in pnspipe.cl itself:
centerpars.calgori="centroid"
centerpars.cbox=5

Jansen's utilities added:
cparse.cl - this will parse string given delimiter
rdlist.cl - returns line n from file f
tcolarit.cl - math operations on columns of data

numerous bug fixes

DATA directory now includes Filter-B-0 calibration

"prev" versions of calctrans etc, "micentroid", and all "maskshifts" files have been removed from pipeline and placed in this archive


11/12/2002 Pipeline 2.1 released

Main changes:

6/2/2004 Pipeline 2.2 released

Main changes:




Pipeline, Disks and Directories

Here is a printable checklist which you can use as an aid when processing data: you can write fault descriptions and other comments on them. Older checklists will help us reonstruct the data reduction procedures used.

For the first pass you might find it handy to use a pipeline 1.2 form

The pipeline script reduce.cl represents a complete pipeline session and can be used as a template.

The data should be kept sorted by DATE during processing, so that the correct calibration frames are used. Owing to the use of "groups" files (see below) we do not need to divide the data into L and R until the very last steps.

Many of the reduction steps are very time consuming, so intermediate backups are a very good idea to avoid having to to start from scratch - these could also be a back-up archival sets.

The images, after conversion to REAL, are about 22 MB in size and it is common to have more than 100 per arm per night, i.e. more than 2 GB.

This gives rise to the following strategy for directory naming and corresponding processing (the letter in the second column gives the designation in the tape archive if data are archived at the end of the corresponding step):
{RAWDATA} R /rawdata/20020101/ raw files from tape, delete after clean
{CLEAN} C /clean/20020101/ cleaned, ushort set (about 1 GB), archive
{FLAT} F /flat/20020101/ debiassed science data, bad pixels fixed, pixel-flatted
{FINAL} FL /final/20020101/ cosmics removed, using La-cos. Science images are now ready to be shifted-and-added. Directory should include images and masks. By default sky flats are not included, it is assumed that only a "standard" vignetting correction will be applied. Standard stars are present.



cd {CLEAN}


Data copying

Assuming the raw (16-bit integer) files are in dirA/, run
cleancopy(dirA)
- this will eliminate the extention [1], and replace "fit" by the standard "fits" extention. In addition, the image size is checked for the correct dimensions, in order to eliminate stray images from other programs.
Finally, the bright line is trimmed off, which changes the image size to [xxsize,yysize] which are environment variables set in pnspipe.cl (set to [2130,2500]). The under- and overscan regions are retained.


Now run
groups()
- this will create lists of files of various types, automatically distinguishing Left and Right. It will warn you if the number of files categorised is not consistent with the total number of files.
At this point, inspect the lists targetL and targetR in which mis-classified arcs and sky-flats will be obvious. Use regroup() (follow the prompts) to reclassify these and after this is done run groups() again to get the lists correct. It is recommended that you not run groups() in any subsequent steps in the reduction unless you really know what you are doing: since the lists at this stage are your best description of the available data types.

Troublesome images, such as those from a focus run which are classified as Science images, can be reclassified to "OTHER" using regroup.


Bias creation (1)

As an optional step,
marktraps(yes,yes)
will examine the first bias frame in the lists biasL and biasR for "traps", which are areas (usually a column or part of a column) with a systematic offset as a result of a charge-transfer problem (this is best seen in the biases). If successful it will create the masks trapsL.pl and trapsR.pl. The masks will be used later by fpixels and that routine allows you to nominate already-existing masks (e.g. in a different directory). Marktraps will delete any existing "traps" masks which it finds in the working directory.

If you plan to use pre-existing bad pixel masks (they exist for EEV12 and EEV13) you do NOT need to run marktraps()
Archive
This is a good point to archive the trimmed, ushort data, with master biases and possibly "traps" masks, as a CLEAN data set.


Some of the following scripts can be executed in one go (see below) but are listed here for completeness and for "service-access".


cd {FLAT} and copy over the files from the corresponding {CLEAN} directory
(more efficient, if you have saved the {CLEAN} data, is to simply rename the directory to {FLAT} and continue)


Bias subtraction
run debias()
This fits a 2D function (order 2 (linear) in X and order 5 in Y) to the over/underscan regions cols 1-28,2101-2128 and subtracts it from each of the science, flats, skys, "other" images (but not the biases themselves which are only of interest for evaluating noise patterns). As this step is irreversible the headers are checked to see if debias has already run and will not allow a repeat. Note that in other steps addPNSheader will report the pre-existence of a header if asked to write it, and will not update it, but will allow the script to run.


Fix bad pixels
run fpixels(flag,maskL,maskR,trapsL,trapsR)
to make a bad pixelmask and/or use it to correct the images by interpolation.

If fpixels is provided the name of an existing mask, via the parameters maskL and/or maskR, it will use that mask. If not, it will attempt to create one using the available flats (sky flats or other). If it cannot find an existing mask nor create one then it will terminate.

If fpixels is provided the name of an ascii file listing bad sections, via the parameters listL and listR, it will incorporate these into the mask.

If fpixels is provided the name of a pixel trap file, perhaps made earlier with marktraps, via the parameters trapsL and trapsR, it will incorporate these into the mask.

The result of all this, if successful, will be a file "maskL.pl" or "maskR.pl". fpixels reports that masks have been created, even though they may only have been renamed.

If the correct flag is set to "yes", fpixels will go ahead and correct images using the IRAF routine FIXPIX. All images in the categories FLAT, SCIENCE and SKY are corrected.

It is suggested that you run first with flag=no if you want to see what the mask looks like.

Some existing mask, traps and bad sections files for EEV12 and EEV13 can be found in the data directory, e.g. maskL="pnspipe$data/maskL.pl" (etc) - but to be sure these can be used make sure that no offsets have arisen between runs. See an important note below.

Example: the following call will create masks from the traps and bad sections files, and correct the images, leaving new mask files in the working directory:
fpixels(yes,"","","trapsL.pl","trapsR.pl","badL","badR")

The final output of the task consists of two new all-inclusive bad pixel files files "maskL.pl" and "maskR.pl" which are the masks actually used in the FIXPIX call within fpixels.

Notes: The creation of bad pixel masks based on flats uses all unsaturated flats, dividing them by their median and then looking pixel-by-pixel for deviations greater than 10%. For non-trivial signal levels such deviations are very aberrant and indicate a bad pixel. A mask is then produced, which is the intersection of all the individual masks. This discriminates against cosmic ray events and against the effect of low-signal masks. No algorithm is used for "connecting" parts of bad columns and rows, this does not seem to be necessary.

If flag=no, the masks are created but bad pixel correction is not performed.

ARC frames are deliberately not corrected, as this might do more harm than good...

Important note:
Between the July 2001 (mask1R) and the October 2002 runs the bad pixel mask for the Right arm shifted by 3 pixels so always check this before assuming you can reuse an existing mask or list of bad pixels.

In such a case you can shift the mask (in this example shift(mask1R,mask2R,-3,0)
but note that the routine which actually fixes pixels, IRAF task FIXPIX, works with physical coordinates so you would need to run wcsreset mask2R physical in order for the shifted mask to work. fpixels(), which calls FIXPIX, does this internally, but it is not a bad idea to do it by hand as well (and necessary if you use FIXPIX manually).

NOTE ADDED BY ND: in the left arm data (EEV12) there are some isolated bad pixels which are detected by fpixels. There are also some rows which appear to have higher gain and which therefore appear to the eye as "bad rows": they are however light-sensitive and they are dealt with by ppcorrect, as they should be.


Flat fielding (1)

the script ppcorrect(flag)
will look for normalised pixel response maps (PflatL,PflatR) and if either does not exist will try to create it. If flag is yes it will go ahead and perform the pixel-to-pixel correction of science and arc data by dividing the data by the map.

ppcorrect creates the maps by using ALL the flats and skys which are not saturated. In principle you should check manually to make sure that the files listed as flats and skys are correctly categorised, as focus runs and masked flats often turn up in these lists. In practice, ppcorrect often returns a message such as "image r449410.fits has negative median" when it encounters such images.

Each of the images is flattened by dividing by a median-filtered image, and then combined with both weights (so that the noise remains poissonian) and rejection (to eliminate the influence of cosmic ray events), these last additions by KK in 4/2002. The correction range is clipped at 10% (see also the notes on fpixels).

Usually, this would also be the place to correct the data for the instrumental function over the field (i.e. for the so-called vignetting). However, the vignetting function for the PN.S is necessarily arrived at indirectly: because (1) the flat fields are affected by the slowness of the shutter, and (2) the vignetting may be wavelength dependent (in the cameras, especially). We may be able to derive an analytical instrumental transission function V=V(x,y,wavel) from the data we have from mask calibration images, including arc lamp and tungsten, together with judicious use of sky and dome flats, but this has not been done as of January 2003. Therefore, the pipeline leaves the data uncorrected with the appropriate corrections to be made to the final data, later.

NOTES:




Archive

At this point the {FLAT} directory might be archived.

To get from {CLEAN} to {FLAT} in one step call the megascript flatten() which executes all the intermediate steps except the copying of data to the new directory.


cd {FINAL}


FINAL DATA The FINAL stages involve removal of cosmic ray events, un-distorting the images and shifting and adding prior to PN searching. You may want to organise the images differently at this stage, e.g. sorted by target.
Note that it is good to have cosmic ray removal at the start of this phase rather than at the end of the previous phase since it is prone to error and this allows better for a recovery.


Cosmic ray removal

We have decided to correct each image using lacos. Therefore
run rmcosmics(iimage,mmin,nniter,oobj,rreplace)
which simply calls lacos_im with the appropriate parameters (no need to specify sections anymore as the bright line has been windowed away).
Typical call: rmcosmics("scienceL",120,3,2.0,yes) will process only those images with exposure time greater than 120, and with 3 iterations, objl=2.0, deleting the original images after cleaning.
Lacos is a very slow-running process: for three iterations 1.5h on a Linux and 3.0h on the fastest Solaris machine (Ptolemy).
For more information, see here. One of the documented bugs in lacos concerns fits files. I believe that this bug has now been circumvented by the call in rmcosmics Still, it is noticed that our rmcosmics procedure does fall over sometimes, whether due to a bug in the script of in lacos itself. Recovery has been made fairly painless: if image r123456 was being processed then files with affixes (eg r123456m) and other temporary files (eg la12534q etc) should simply be deleted - if all is well the uncleaned image will still be there. Then start (e.g.) rmcosmics("scienceL",120,3,2.0,yes) again - no need to change the list. Files already cleaned will be recognised as such (affix "c") and skipped.


Image transformations

"Un-distorting" and Calibrating

The philosophy is as follows: masks are taken at regular intervals during observing so that drift can be corrected; in addition any image rotation can be removed and the appropriate orientations obtained. Finally the curvatures in the images can be removed, which facilitates image stacking. This can all be done with a single wavelength in the mask images, transforming the images so that these holes line up with a (fictitious) perfect mask. This is "un-distorting". All the images and their corresponding masks are undistorted by the same code using the same database - so that all effects generated by this step will be common to images and masks.

To allow for changes in the dispersion and perhaps in the dispersion direction due to mechanical shifts in the instrument - i.e. to properly track wavelength-dependent distortions - the spots corresponding to several wavelengths are centroided in the undistorted masks. Since some of the lines will be weak, a high-quality reference mask is used as the starting point for the centroiding.


Image combining




PN finding

There are two general strategies: ID'ing by eye with blinking, or automated source-finding searches.

By eye can use Koen's "searchpne", which has now been modified to account for new image transformations: you can specify any rotation angle for each of the two images displayed in searchpne. Can be zero, can be 180, or whatever, so no problem either way. Might be more convenient to transform the images to the same orientation, but for searchpne it is not required

Note KK 4/4/2002

searchpne sw...

has been updated. Latest update: Aug 2003 -KK New features:
* hit ? and you get a list of commands, see also searchpne.txt
* m has been fixed
* independent contrast scaling of the two windows
* you can now rotate each window by an arbitrary amount (a key)

The program has been linked to the cfitsio library, see searchpne.txt.

Automated searching may be possible using "starfind" or something. A semi-automatic procedure, which requires setting a number of cuts and thresholds interactively, has now been installed and can be found in the $SEARCHPNE/ directory (see searchpne.txt). It relies on sextractor for detection of sources and their photometry, and on a supermongo script that selects those sources that have no continuum, and those pairs that line up in y and whose velocity is within alloed range. Added Aug 2003 -KK

In any case what is crucial is some kind of objective test(s) of the legitimacy of image pairs -- are they legitimate or are they noise? Potential tests:
  • magnitude consistency
  • pair aligned in dispersion direction (testable by dy error in pnsvel)
  • magnitude cutoff limit for each image, as function of magnitude and radial position (due to galaxy light) -- perhaps generated simply by making a deliberately wrong shift+add image to see what characteristics of pure noise are

    Note that searchpne uses its own centroiding algorithm, weighting with a gaussian of dispersion 2.5 pixels. So its output numbers should be treated for now as preliminary, and input to an IRAF PHOT script (e.g.) to get robust centroids.

    Centroiding and photometry methods still needed...


    Wavelength solution

    calctrans

    3/9/03: Aaron's notes on optimising wavelength solution
    here.
    In summary:
    - need 5 wavelength spots (5009, 5017, 5022, 5035, 5038)
    - need higher-order terms (see here for list)
    - in high-S/N mask, calctrans fit indicates rms accuracy of 0.12 pix = 0.09 Å ~ 6 km/s
    - in normal mask, calctrans fit indicates rms accuracy of 0.18 pix = 0.14 Å ~ 9 km/s
    - in normal mask, simulated data toward centre of wavelength range indicate accuracy is ~2 km/s (rms), ~7 km/s (max excursion)
    - in normal mask, simulated data toward edge of wavelength range indicate accuracy is ~3 km/s (rms), ~8 km/s (max excursion)
    - solution rapidly loses accuracy outside of fitted region (5009-5038 Å).
    - external checks involving Galactic PNe are not helpful, probably because they lie outside wavelength range of B-6° bandpass, but there are also puzzling centroiding and flux issues

    Much of the work has already been done by alignspots (see above) - spot files from the "undistorted" masks corresponding to your data set now have to be fed to calctrans to produce the wavelength solution over the field. This is done by fitting a Chebyshev series mapping the original position of a spectral line at a mask hole to its dispersed spot: (x0,y0,lambda) -> (x,y) . The fit is done iteratively, rejecting outlying points until it converges on some tolerable residual (Koen: what criterion is used?). The wavelengths to be used are listed (in code) in the first line of the spots file and are read in by calctrans and converted to full precision:
    4943    	4942.915        
    4945    	4944.990        
    4955    	4955.382        
    4956    	4956.750        
    4957    	4957.122        
    4965    	4965.073        
    4972    	4972.157        
    4974    	4973.538        
    4990    	4989.948        
    4995    	4994.930        
    5005    	5005.159        
    5009    	5009.334        
    5011    	5011.003        
    501725    	5017.25  (was 5017.316)
    5023    	5022.870        
    5031    	5031.350        
    5036    	5035.989        
    5038    	5037.751        
    5049    	5048.813        
    5053    	5052.930        
    5054    	5054.178        
    5060    	5060.079        
    5062    	5062.036        
    5073    	5073.076        
    5074    	5074.190  
    
    NOTES: calctrans requires an exact match to the above codes, so the fact that 4956 is not the nearest integer does not matter. The above lines are all Ar or Ne, no allowance has been made for He lamps (which are not available at the WHT).

    Sample calctrans parameters are:

    xspots = r433550un.spotsR : output of alignspots(FF/r433550)
    xxmod = wavemodsx.dat : listing of which polynomial terms to use in fit to x-position: powers of (x0,y0,lambda) given in each row
    yymod = wavemodsy.dat : same as above, for y-position
    xxout = r433550un.xcoefR : output coefficients of wavelength solution for x-position
    yyout = r433550un.ycoefR : same as above, for y-position
    pgplotde= /xs : output device for vector plot of fit residuals

  • This is run separately for the Left and Right arm masks (the naming convention for the extentions as illustrated above is VERY MUCH recommended and a check for consistency should perhaps be built into calctrans). Each line of (e.g.) wavemodsx.dat has the form
    m1 m2 m3
    representing a term of the form xm1ym2wavelm3 (see notes below on which terms to use).

    Notes:
    The arc line wavelengths come from data taken with the RGO spectrograph in high resolution using similar lamps and/or from this source. The central wavelength of the blend at 5017Å depends on the relative linestrengths, and we have taken the ratio measured by the RGO spectrograph (5017.160(ArII 20):5017.629(ArII 10))to calculate the weighted mean wavelength - however it remains uncertian whether thisline should be included especially if it is close to the edge of the filter passband.

    pnsvel

    Now the velocities of objects from the science images are found using the wavelength solution, and image pairs from the L/R images. pnsvel.cl does this using such parameters as:

    lxwave = r433549un.xcoefL : output coefficients of calctrans for L image
    lywave = r433549un.ycoefL : as above
    rxwave = r433550un.xcoefR : as above, for R image
    rywave = r433550un.ycoefR : as above
    infile = data.png018.6-02.2 : listing of image pairs: (xL,yL,xR,yR) for each object
    outfile = spec.png018.6-02.2.1 : output of object's wavelength (we should add velocity); dy gives an indication of goodness of fit

    NOTES:
    PN fluxes

    In the "reduction.cl" files which are available as a template, the images are scaled and added in such a way that the counts correspond to T=1800s. It is suggested that, to avoid confusion, object lists use this convention during the reduction process.

    To convert counts to fluxes, one needs to compare with the count rate from a standard star (or galactic PNe). The fluxes from both the standard, and the observed PNe, should be divided by the appropriate (smoothed) transmission function. This is done using the undistorted co-ordinates. Transmission functions, which may depend on the filter used, are available in image form at pnspipe/data/vignetting. If a given PN has been observed at (xL,yL) in FINAL-LEFT and at (xR,yR) in FINAL-RIGHT with the BARR-A-0 configuration, then then the fluxes aL,aR must be divided by VIG-LEFT-BARR-A-0[xL,yL] (etc etc).

    Notes on the preparation of the vignetting functions (in postscript).


    PN astrometry

    To a good approximation the corrected cartesian coordinates of any object (PN or star) is simply the average of the L/R (x,y) values, since this is the way the transformations have been set up. The pipeline script xstartrails uses this fact to generate an astrometric database (of type "*.xdb"). The x,y coordinates of the PNe are generated by pnsvel. To convert these to positions, use cctran (e.g.):
    pn> unlearn cctran
    pn> lpar cctran
            input =                 The input coordinate files
           output =                 The output coordinate files
         database = "N7457.xdb"     The input database file
        solutions = "forward"       The input plate solutions
        (geometry = "geometric")    Transformation type (linear,geometric)
         (forward = yes)            Transform x / y to ra / dec (yes) or vice versa (no) ?
            (xref = INDEF)          The X reference pixel
            (yref = INDEF)          The Y reference pixel
            (xmag = INDEF)          The X axis scale in arcsec per pixel
            (ymag = INDEF)          The Y axis scale in arcsec per pixel
       (xrotation = INDEF)          The X axis rotation angle in degrees
       (yrotation = INDEF)          The Y axis rotation angle in degrees
          (lngref = INDEF)          The ra / longitude reference coordinate in lngunits\t
          (latref = INDEF)          The dec / latitude reference coordinate in latunits
        (lngunits = "")             The input / output ra / longitude reference coordinate units
        (latunits = "")             The input / output dec / latitude reference coordinate units
      (projection = "tan")          The sky projection geometry
         (xcolumn = 1)              Input column containing the x / ra / longitude coordinate
         (ycolumn = 2)              Input column containing the y / dec / latitude coordinate
       (lngformat = "")             Output format of the ra / longitude / x coordinate
       (latformat = "")             Output format of the dec / latitude / y coordinate
    (min_sigdigit = 7)              Minimum precision of the output coordinates
            (mode = "ql")           
    
    To double-check these steps compute some star positions by hand using centerstars and add these to the list of PNe positions L and R. pnsvel should return a "velocity" for the stars which has a constant value equal to the difference between the central wavelength of the filter and the "key" wavelength used by calctrans, and from the "position" determined cctran should return the input RA and Dec as listed (in this example) in "N7457.xcoo".



    last modified by AJR, 15 December 2003