LIGO Data Overview
This describes what the LIGO data flow looks like from QuarkNet's end. This documentation is current as of December 2016, and it supersedes any documentation you find referring to data2.i2u2.org or data4.i2u2.org, or to www13/www18 (the Argonne severs, defunct as of Q1 2016).
If you're troubleshooting a problem, the information in
Edit's notes (item 8) and Mihael's
"How LIGO Works" summary are helpful.
Data is collected from the two LIGO observatory sites at Hanford and Livingston. The sensors that provide the data operate around-the-clock over spans of years (2003 - present, with a break during the upgrade shutdown from 2011-2013), and one of the main challenges of constructing the LIGO e-Lab is transferring and storing this continuously-flowing stream of data.
Between the two sites, nine seismic sensors continuously generate 189 separate streams of data that are delivered to the ELabs e-Lab. Most of these are then made available to plot within the e-Lab. A bird's-eye overview of the process:
- LIGO delivers
.gwf
frame files every night
- i2u2-data:
/disks/i2u2/ligo/data/frames/
- ImportData uses the new frames to update the streams
- i2u2-data:
/disks/i2u2/ligo/data/streams/
- DataServer.py serves streams to users via the e-Lab plotter
- i2u2-data:
/disks/i2u2/ligo/data/streams/DataServer.py
The first two are governed by cronjobs that typically require little attention. The third, DataServer.py, requires manual start and restart, but is otherwise low-maintenance. At present (Q1 2017), the e-Lab receives and stores far more data than is used in the e-Lab.
The e-Lab receives no gravitational wave data from LIGO
Frame Files
LIGO stores its sensor data in the form of frame files ending in the
.gwf
extension ("gravitational wave frame", I assume). Each
.gwf
frame file represents a "snapshot" of all data generated by a set of sensors over a relatively short period of time (one hour for minute-trend data, or either one minute or ten minutes for second-trend data). LIGO does a little pre-processing of this frame data before delivering it to
i2u2-data nightly.
Frame file directory structure
ELabs writes the frame files that LIGO sends us into a set of directories within
/disks/i2u2/ligo/data/frames/
on
i2u2-data.
trends
The first division of the frame files within
frames/
is into the directories
trend/
and
trend_after23April2013/
. From 2011 to 2013, the LIGO seismic sensors were turned off during general maintenance and upgrades to the experiment, so no data exists from that period. All of the files from before the shutdown (2003 to 2011) are in
trend/
- this directory is now static, and its data shouldn't ever change. Files from after the 2013 restart are in
trend_after23April2013/
, including newly-delivered data.
Within each of
trend/
and
trend_after23April2013/
, the frame files are further subdivided into
minute-trend/
and
second-trend/
directories according to which type of time-sampling the data uses. The e-Lab uses "minute trend" data; the "second trend" data is higher-resolution and has been used for testing, but no finished products that use it have been rolled out. Nevertheless, we still receive, process and store it in case it's used in the future.
As of Q1 2017, only minute-trend data is plotted in the e-Lab, so if you're troubleshooting a problem with incoming frame data, you'll typically want to go straight to
i2u2-data:/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/
to check on the incoming data. Note that
trend_after23April2013/
is owned by quarkcat and has owner-only permissions, so you'll need to
$ sudo su
before you can
cd
into it.
sites
Within each of
minute-trend/
and
second-trend/
, the frame files are divided by their origin site:
LHO/
for "LIGO Hanford Observatory" or
LLO/
for "LIGO Livingston Observatory." In terms of data flow, there's no real difference between the two.
Within each of
LHO/
and
LLO/
is a set of subdirectories with names like
L-M-1147/
, indexed by the site (
L
for "Livingston" or
H
for "Hanford") and trend-type (
M
for "minute" and
T
for "second", for some reason), as well as a 4-digit number. The number is the first 4 digits of the timestamp of all the frame files within it, representing millions of seconds (1 million seconds is a little over 11 1/2 days). The frame files are bundled into these subdirectories to keep them organized. For example, we have the directory
/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/LLO/L-M-1147/
containing all minute-trend frame files from the Livingston observatory for the 11.5-day period where all timestamps begin with 1147. New directories are created automatically whenever the fourth digit rolls over.
Frame file naming conventions
The general standard for naming frame files is described in this
project note from LIGO (from 2001 - old, but still accurate as of 2017).
The
.gwf
frame files themselves are contained within the bundle directories (e.g.,
L-M-1147/
). A typical example is
/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/LLO/L-M-1147/L-M-1147986000-3600.gwf
The
L-M-
prefix again refers to the site (Livingston) and trend-type (Minute). Files in the other directories will have
L-T-
,
H-M-
or
H-T-
, as appropriate.
The long string of digits that follows is a timestamp in GPS time format, which is the number of seconds since midnight on 6 January 1980. If you want to know the regular date and time associated with a frame file but for some reason you can't work that out in your head (:/), the LIGO experiment provides a nifty
converter for you.
The last bit represents the timespan covered by the file's data in seconds. Minute-trend data is sampled over a span of an hour (3600 seconds) before being packaged into the frame, while second-trend data is sample over a span of ten minutes (600 seconds). You'll notice that the timestamps of sequential frame files are incremented by these values. So, the file
L-M-1147986000-3600.gwf
contains minute-trend data from all Livingston sensors taken between 1147986000 and 1147986000 + 3600 = 1147989600 seconds. The next frame would be named
L-M-1147989600-3600.gwf
and contain data in the range [1147989600, 1147993200), etc.
Even though the file and directory names look arcane, pretty much everything is determined by the combination of observatory site and trend-type. Going back to the example above,
/disks/i2u2/ligo/data/frames/trend_after23April2013/
minute-trend/L
LO/
L-
M-
1147/
L-
M-
1147986000-
3600.gwf
- The trend-type items will always match and will be either (minute-trend - M - M - 3600) or (second-trend - T - T - 600)
- The site items will always match and will be either L or H
- The first GPS timestamp item will always match the first four digits of the second
It looks complex only because there's a lot of redundancy.
Each frame file contains information which must be appended to the streams of many different sensors. This is what the
ImportData
script does.
Stream Files
Stream file naming conventions
The data from each individual seismic channel is stored on
i2u2-data in the directory
/disks/i2u2/ligo/data/streams/
as sets of files called "stream files." Stream filenames are constructed of a succession of labels indicating
site -
subsystem -
station -
sensor -
sampling
The
LIGO Channels page details each of these identifiers.
Each sensor's data stream appears as a set of three files within the
streams/
directory; for example,
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.bin
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.index.bin
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.info
The regular
.bin
file is the primary data file and will typically be on the order of GB in size. The much smaller
.index.bin
and
.info
files are auxiliary files that help with the processing and plotting of the main file.
The filenames encode the exact seismic sensor and data channel of the stream, and they correspond closely to the stream names as identified in the e-Lab Analysis Tool. For the example given above,
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ
- L1 indicates the Livingston site
- DQ indicates that this stream is directly from the PEM subsystem and does not have DMT frequency-processing applied to it
- LVEA_VERTEX indicates the vertex station of the observatory, at the Laser and Vacuum Equipment Area
- SEIS_..._X indicates the x-direction accelerometer of the seismic sensor (seismic as opposed to tilt or magnetometer)
- This example has no sampling identifier, because only DMT subsystem streams have frequency sampling.
- I still haven't figured out what CS indicates
The stream file directory
The full contents of the
/disks/i2u2/ligo/data/streams/
directory are, in order of
$ ls
,
-
DataServer.py
, the RESTful python server that delivers requested streams to the e-Lab Analysis Tool. It should always be running, or else the e-Lab can't get data to plot.
- 807
H0
files representing 269 data streams from the Hanford Observatory.
- 690
H1
files representing 230 data streams from the Hanford Observatory.
-
ImportData.errors
, the error log for the ImportData
script that creates the stream files out of the frame files.
- 225
L0
files representing 75 data streams from the Livingston Observatory.
- 570
L1
files representing 190 data streams from the Livingston Observatory.
-
ligoimport.files
, the log that records which frame files have been imported into their respective sets of stream files.
-
nohup.out
, the log file to which output from DataServer.py
is redirected when it is started using the nohup
("no hangup") command.
-
old_ligoimport.files
, an old version of ligoimport.files
Unlike frame files, which increase in number nightly, the number of stream files is fixed according to the number of seismic sensors at LIGO.
Cronjobs
The e-Lab cronjobs on
i2u2-data belong to user quarkcat, and you can see them with the command
$ crontab -l -u quarkcat
(
-u
specifies the user, just as with
sudo
, and
-l
directs the output to the terminal) (if you're curious, user-owned cronjobs like this are stored in
/var/spool/cron/crontabs
, but you shouldn't edit them there. Use the
crontab
command). The LIGO-relevant part should look like
#Ligo data import and conversion
0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo i2u2data@terra.ligo.caltech.edu::ligo/trend_after23April2013/second-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/second-trend > /tmp/second.log 2>&1
0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo i2u2data@terra.ligo.caltech.edu::ligo/trend_after23April2013/minute-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend > /tmp/minute.log 2>&1
50 0 * * * /usr/local/ligotools/i2u2tools/bin/ImportData /disks/i2u2/ligo/data/frames/trend_after23April2013 /usr/local/ligotools/ligotools /disks/i2u2/ligo/data/streams > /tmp/convert.log 2>&1
The first two are
rsync
commands to pull second-trend and minute-trend frame files, respectively, from the Caltech LIGO server terra.ligo.caltech.edu, acting as user i2u2data on that machine. This is done every day at midnight (Eastern time, I assume, since that's where
i2u2-data is). The files are written to
i2u2-data in the appropriate subdirectory of
/disks/i2u2/ligo/data/frames/trend_after23April2013/
.
The third command runs the
ImportData
script every morning at 12:50am, which converts the frame files into stream files that the e-Lab can plot. Note that there are three arguments to
ImportData
. The first gives the source directory of the files to be converted, the second gives the location of the LIGOtools programs that do the conversion, and the third is the destination directory where the converted stream files are written.
Note the location of the error logs for these processes:
-
/tmp/second.log
-
/tmp/minute.log
-
/tmp/convert.log
The first two are useful if you think frame files aren't being delivered from Caltech and written to
i2u2-data properly. The third is useful if you think the frames aren't being converted to streams properly.
-- Main.JoelG - 2016-05-25