This section is dedicated to the simple question, "Is the GC software running correctly?"

There are several ways to check to see if the GC is running correctly.  Obviously, you could look at the chromatograms and engineering data screens if you were sitting in front of the computer.  This section will focus on other methods that can be used to check via a remote terminal.  Hence, this will assume the user is either telneted or QNX networked to the data acquisition computer.  So here are three easy steps to check if the data acquisition software is running okay.

First check --  Are the programs running?

    Use the command ps to see which processes are running.  Make sure they are all there.  Here is a sample listing:
     
      $ ps
        PID  PGRP SID PRI STATE   BLK  SIZE COMMAND
       8774  8774   4 10o  WAIT    -1   44K /bin/sh //1/home/stealth/stealthdoit -W
      28231  1386   3 10o REPLY     0   20K (//1/usr/local/bin/namewait)
         96    96   1 10o REPLY    15   44K -sh
      28780 28780   5 10o  RECV     0   56K (//1/usr/local/bin/parent)
        114   110   2 18r  RECV     0   96K (//1/usr/lib/windows/bin/screen_event)
        116   110   2 14o  RECV     0  272K (//1/usr/lib/windows/bin/dialog)
        122   122   2 12o  RECV     0  176K (//1/usr/lib/windows/apps/Olwm/olwm)
      22145 28780   5 10o  RECV     0   52K (//1/usr/local/bin/memo)
      16008 28780   5 10o  RECV     0   92K (//1/home/stealth/stealthcol)
       2703  2703   6 10o REPLY     0  108K (//1/bin/less)
       9368 28780   5 10o  RECV     0   64K (//1/usr/local/bin/lgr)
       9369 28780   5 10o  RECV     0   20K rsserv -a E1 E2
      21659 28780   5 10o  RECV     0   60K (//1/usr/local/bin/bfr)
       3231 28780   5 10o  RECV     0   72K (//1/home/stealth/stealthsrvr)
       8875  8774   4  9o REPLY 21659   80K (//1/home/stealth/stealthdisp)
      28845  8774   4 10o  RECV     0   76K (//1/home/stealth/stealthclt)
      22704 28780   5 10o  RECV     0   76K (//1/home/stealth/stealthalgo)
      22705 28780   5 10o REPLY     0   12K stealthcol_wd
       7886  6453   8 10o REPLY     1   24K ps
        214   214   7 10o  RECV     0 1344K (//1/usr/lib/windows/apps/rtg/rtg)
       6453  6453   8 10o  WAIT    -1   44K -sh
       1386  1386   3 10o  WAIT    -1   44K /bin/sh
       4501  1386   3 10o  WAIT    -1   36K ksh rungc
       
    Yes, there is a ton of processes!  Generally, if there is a software problem (ie crash!) then several of these processes will be gone.  For a description of these processes click here.
Secondly -- Check if the data files are there and one of the files is open (busy).
    The log0000 directory is where the current data files are stored.  There can be several data files, but there will be only one open and "growing" file.  To look in the log directory try this:
     
      $ ls -l log0000
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:14 log0000
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:19 log0001
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:24 log0002
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:28 log0003
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:33 log0004
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:37 log0005
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:42 log0006
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:47 log0007
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:51 log0008
      -rw-rw-r--  1 stealth   user          29978 Dec 11 18:56 log0009
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:01 log0010
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:05 log0011
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:10 log0012
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:14 log0013
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:19 log0014
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:24 log0015
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:28 log0016
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:33 log0017
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:38 log0018
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:42 log0019
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:47 log0020
      -rw-rw-r--  1 stealth   user          29978 Dec 11 19:51 log0021
      Brw-rw-r--  1 stealth   user          17408 Dec 11 19:54 log0022
       
    Notice all of the files are the same size except the last one AND the last file is currently busy (hence the "B" at the beginning of the file permissions).
Thirdly -- Check the stealth.log file.
    The stealth.log file is created and updated by the process "memo".  Any commands issued by the algorithm programs (like stealthalgo) and errors from other processes are recorded here.  Below is working log file.
     
      $ more steatlh.log
      20:10:56: memo: task 9688: started
      20:10:57: memo: ctrl: task 3546: started
      20:10:57: memo: TTDrv: Initialized: No such process
      20:10:57: memo: ctrl: registering stealthcol: task 29150
      20:10:57: memo: Col: task 29150: started
      20:10:58: memo: lgr: task 29678: started
      20:10:58: memo: Col: task 29678 is my ring (node 1) client
      20:10:58: memo: lgr: Can't have files/dir 0, defaulted to 100
      20:10:58: memo: sol: task 9712: started
      20:10:58: memo: bfr: task 3057: started
      20:10:58: memo: Col: task 3057 is my ring (node 1) client
      20:10:58: memo: bfr: buffer size: 188 rows (5076 bytes)
      20:10:58: memo: Col: Pointer set from pid 9712
      20:10:58: memo: sol: achieved cooperation with DG
      20:10:58: memo: ctrl: registering soldrv: task 9712
      20:10:58: memo: sol: No Mode
      20:10:59: memo: Srvr: task 3075: started
      20:10:59: memo: sol: No Mode
      20:10:59: memo: sol: No Mode
      20:10:59: memo: sol: No Mode
      20:10:59: memo: ctrl: registering stealthsrvr: task 3075
      20:11:00: memo: paint: task 23069: started, and will terminate shortly
      20:11:00: memo: paint: task 3109: started, and will terminate shortly
      20:11:00: memo: bfr: task 10279 is my star client, (center node 1)
      20:11:00: memo: Ext: task 10279: started
      20:11:00: memo: ctrl: registering stealthclt: task 29748
      20:11:00: memo: Clt: task 29748: started: No such process
      20:11:00: memo: Col: task 26681 is my ring (node 1) client
      20:11:00: memo: Srvr: TMA: Telemetry Start
      20:11:00: memo: TMA: task 26681: started
      20:11:00: memo: Col: Using System Timer
      20:11:02: memo: Srvr: TMA: DAC_Board Enable
      20:11:02: memo: Srvr: TMA: Flow_Ctr Main1 Setpoint 55
      20:11:02: memo: Srvr: TMA: Flow_Ctr BF1 Setpoint 50
      20:11:02: memo: Srvr: TMA: Flow_Ctr Main2 Setpoint 50
      20:11:02: memo: Srvr: TMA: Flow_Ctr BF2 Setpoint 60
      20:11:02: memo: Srvr: TMA: Flow_Ctr Main3 Setpoint 45
      20:11:02: memo: Srvr: TMA: Flow_Ctr BF3 Setpoint 45
      20:11:02: memo: Srvr: TMA: Press_Ctr Setpoint 2100
      20:11:03: memo: sol: Mode 3
      20:11:03: memo: TMA: Mode 3 -- Cycle mode

       
    There is a lot of information contained in this file.  If there was an error in this file it could look something like:
     
      20:31:31: memo: ctrl: stealthcol died: task 29150
      20:31:31: memo: TMA: task 26681: DC operations completed
      20:31:31: memo: bfr: highest buffered row usage: 2
      20:31:31: memo: ctrl: checking out stealthcol: task 29150
      20:31:31: memo: bfr: final stamp/cmd sequence number: 3
      20:31:31: memo: bfr: highest number of stamps/cmds held at once: 2
      20:31:31: memo: bfr: task 3057: DC operations completed
      20:31:31: memo: lgr: task 29678: DC operations completed: No such process
      20:31:31: memo: Ext: task 10279: bowing out
      20:31:31: memo: bfr: my ring neighbor task 29678 bowed out: No such process
      20:31:31: memo: bfr: task 10279 bowed out from star, (center node 1): No such process
       
    Where the collection program stealthcol died which subsequently brings down several other processes.  A crash of this magnitude would also show up in the ps listing mentioned in the first step.  The key to catching a failure in the log file is to look for "bowing out" and "bowed out" messages.
     
And now, most importantly, what to do if the system has died.
Back to the data acquisition table of contents
Return to CATS index