Re: Radio/Interferometry: Archive and VO information wanted

From: Doug Tody <dtody-at-nrao.edu>
Date: Thu, 19 Jun 2003 16:19:43 -0600 (MDT)


The following was prepared by John Benson and Doug Tody.  

> 1. Name and nature of observatory and/or facility
>

National Radio Astronomy Observatory - AOC Socorro, NM

NRAO is in the process of transitioning from an old tape-based archive system, to storing data from all NRAO telescopes online on spinning disk, in a central archive. Construction of the new disk-based archive system began about two years ago. Ultimately data archiving will be fully automated, storing both raw and processed data, using automated pipeline processing to calibrate data and generate reference images and spectra.

Currently we are copying the VLA tape archive onto RAID disk arrays. We have copyed some VLBA data, and GBT data files are being ingested into the archive as they are produced in Green Bank.

> 2. Current archive status and description:
> a) How are data stored?

The VLA raw data (visibilities) are stored on Exabyte tapes and on a RAID disk array. The disk array archive now goes back to 1976, and all new data produced by the VLA is routinely ingested into the archive. There is a time lag of about one week between observing and the appearance of the data in the archive.

The VLBA raw data are stored on DAT-3 (and older) tapes, and we are beginning to copy that data to the archive disk array. The full VLBA archive is about 12 Tbytes in size. We will probably only keep the most recent several years of raw VLBA observations on spinning disk.

Calibrated VLBA data is now being made available to observers. The calibrated data files will be stored on the disk array permanantly.

> b) How are data catalogued?
>

A single catalog system has been designed to support the data archive of the VLA, VLBA, GBT and EVLA. The catalog tables are Oracle tables, queries are served through Java servlets. Queries are entered through html forms pages.

> c) Do you provide information about sources (as distinct from about
> observations) e.g. calibrator lists, target properties, and if so
> is this:
> i) catalogued information?
>

We have a calibrator source tool that is related to the data archive, but the two systems have not yet been properly integrated. Someday soon I hope.

> ii) plots?

The calibrators (VLA and VLBA) provide image files and visibility files which can be used to visualize most calibrator sources. This is an on-going project.

> 3. What can be accessed on-line?

The archive is now undergoing acceptance testing and should be available for general use in the fall of 2003. Anyone will be able to browse through the archive catalog system. Access to the calibrator source tool is planned for Q3 2003.

> 4. Who can access it?
>

Public domain data can be selected and downloaded by anyone.

> 5. What are the methods of access?

Web-based forms tools or email requests. Data is asynchronously staged to a public FTP area, with an email message when the data is fully staged and available for download.

> 6. What search parameters are available?

Currently we're supporting two separate web-form pages.

  1. A simple query forms page for people who only want to find and download their data. The search parameters are : project name, project segment, observer name, archive file id and observing time range. One or more of the parameters will return a list of archive files in the disk array, and the user may check off which ones to copy to the local public ftp area.
  2. A larger query page that basically holds any search parameters that I think would be useful. This is somewhat of a test bed for the query servlet, and something for the local astronomers to fiddle with and give us feedback for development. In addition to the parameters above, this forms page includes : source id by string compare or SIMBAD resolution, cone searches by coordinate entry, source type, calibrator source code, telescope, telescope configuration, observing frequency bands, observing frequency range, and molecular transition.

This forms page also allows selection of the format of the query reply : html, text file or stream, VOTable.

Query page 2 is very much under construction....

> 7. What is your Archive Policy?

Data is held under proprietary protection for 18 months after the last observation (of that observing proposal). This policy is under review by VLA/VLBA and GBT directors.

> 8. What software do you use?
>

Java servlets, Tomcat, JBOSS, Oracle 8 (going to Oracle 9 soon).

> 9. What software do users need, and can you provide it?

Web-browser, and I think a recent version Java RTE for the calibrator source tool.

> 10.What format(s) are your data in (or can be translated into)?

We are currently able to only deliver telescope data in its native format, VLA Export, VLBA = FITS-IDI, GBT = FITS-GBT. Our plan is to support tranlations as part of the archive server. This probably won't be ready for a year or so.

> 11. Do you use pipelines?

Pipeline processing of VLBA data is now available. A project to develop an automated pipeline for ALMA data is underway. A pipeline for EVLA data is planned.

> 12. How far are data normally reduced before being supplied to the user?

Only the VLBA currently offers calibrated data files.

> 13.Are these stages:
> a) documented?

The VLBA calibrated data files are FITS files written by AIPS, so they contain a fairly complete HISTORY header.

> b) reversible?

Probably not.

> 14. Can data be processed remotely?

Not currently.

> 15.What Virtual Observatory projects (if any) are you involved in?

We are heavily involved in both the US NVO project, and in IVOA standards development.

> 16.Do you use explicitly any interoperability tools, e.g. data models,
> UCDs, VOTable?

We use all of the above as part of our participation in VO framework development.

> 17.Do you publish any data via existing VO-like facilities e.g. CDS,
> MAST?
Not really, unless you count surveys like NVSS and FIRST which were done with NRAO telescopes. These are widely available.

> 18.Making data acess easier for a wider range of astronomers - what are
> your views on whether/how these suggestions should be impliments:

> a) Using a VO interface to radio observatories/data centres to run
> hidden software to provide required image, light curve,
> visibilities etc.?

Ultimately all of the above should be made available via the VO data access interfaces now under development.  

> b) Supplying information about hidden processing (software, versions,
> parameterisation, etc)?

It is important to fully characterize any data delivered via the VO, especially if the data was automatically generated, or generated on the fly by VO services.  

> c) Standardising the software in use at radio observatories/ data
> centres?

There will probably always be a variety of packages in use thoughout the community, but it would be good if we could make them more interoperable.

There should be a set of well developed standard tools for contructing VOTables, as well as parsing and displaying VOTable contents.

> d) Standardising the format of data products?

This is highly desirable, especially for processed data stored in archives, e.g., calibrated UV data, spectral data cubes, and so forth. There is a lot of interest within our community in being able to use different reduction packages to process data from new telescopes.

> 19.What do you think astronomers want from your data?

Easy access? Full characterization of origin, data quality, and any processing.  

> 20.What are your plans for archive development or any other relevant
> suggestions?

Ultimately we hope to replicate the entire archive and VO data access interfaces onto the Teragrid within the US (or at least at a supercomputer center such as NCSA, where it can be accessed and possibly processed on the Teragrid).

>
> THANK-YOU
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Dr. Anita M. S. Richards, AVO Astronomer
> MERLIN/VLBI National Facility, University of Manchester,
> Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K.
> tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).
Received on 2003-06-20Z00:20:57