GMBrowser Update – Uses all gates now

  • Previously GMBrowser that shifters look at only uses a fraction of the gates, because the early processing stages (particularly DecodeRawEvent) were slow.
  • Now that we have a faster version of DecodeRawEvent, and we modified GMBrowser to use all gates
  • Modified following parameters in NearlineCurrent.opts in Tools/DaqRecv/options on mnvonlinelogger, to be 100 percent:
    • PdstlPrescaler.PercentPass          = 25;
    • LinjcPrescaler.PercentPass          = 25;
    • NumibPrescaler.PercentPass       = 20;
  • Ran the “nearline_software_sync.sh” script in all Nearline Machines to get the update
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Informed Current Shifter about the update and started GMBrowser at Tufts UROC
    • Will investigate the behavior for some time, until we make this change permanent.

Nearline File Management Problems

We still have problems for nearline file management and I listed the ones I found. Here is the list of folders need to be managed.

  1. Synchronize /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/
  2. Synchronize /scratch/nearonline/var/gmplotter/plotter/ with /minerva/data/users/nearonline/gmbrowser/plotter/
  3. Synchronize /scratch/nearonline/var/gmplotter/www/ with /minerva/data/users/nearonline/gmbrowser/www/
  4. Copy Files from /scratch/nearonline/var/gmplotter/www to minerva@minerva-wbm.fnal.gov:/opt/if-wbm/htdoc/minerva/echecklist/gmb_hists

Here is the status of each section

1) I modified the script to use rsync command to sync between /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/ For now, we have a stable synchronization between two folders however,this method copies the .log files also, which is unnecessary.

2,3) No USER “nearonline” under /minerva/data/users Setup script assigns the following export NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/data/users/nearonline/gmbrowser There is a “nearonline” user under /minerva/app, however we should not copy any data file to the /minerva/app.

4) e-Checklist works, I conclude this section works. I did not checked the details.

We should organize a plan to solve all the problems in nearline file management. I propose the following,

  • Lets use rsync command for 1,2,3
  • We need to create a folder “/minerva/data/users/nearonline” and let other systems know where we are copying the files.
  • If there is a folder I forgot to sync between nearline and bluearc, that folder also needs to be added to the script.

Control Room Computer Update to v10r9p1

  • Updated minerva-cr-01 and minerva-cr-02 to v10r9p1
  • Modified setupFiles for initiating setup with v10r9p1
    • tempSetup.sh
    • .bashrc
  • Modified setup.min.soft.sh for v10r9p1
    • Python was a problem for v10r9p1 setup script
    • Created an alias to use the local version python instead of Framework Version
      • alias python=”/usr/bin/python”
  • Tested and Tagged ControlRoomTools under v10r9p1 as “stable_v10r9p1″
  • Updated the .bashrc for using the setup file under v10r9p1
  • Linked .k5login with “cmtuser/Minerva_v10r9p1/Tools/ControlRoomTools/authenticate/k5login-master”
  • Updated the documentation on wiki: “Control_Room_Setup_Manual”

Software Update

  • mnvonlinelogger updated
  • Slave Nodes will receive update automatically
    • Updated Packages under cmtuser area
      • Tools/DaqRecv [croce_v3]
        • cvs co -r croce_v3 Tools/DaqRecv
    • Installed Packages under cmtuser area
      • Event/MinervaKernel [croce_v3]
        • This package required for Event/MinervaEvent
        • getpack -u Event/MinervaKernel
      • Event/MinervaEvent [croce_v3]
        • cvs co -r croce_v3 Event/MinervaEvent
    • Built All Packages in the following order
      1. Tools/DaqRecv
      2. Event/MinervaKernel
      3. Event/MinervaEvent
    • Building Commands:
      • cmt config
      • cmt make
      • source setup.sh

Problem after restarting Nearline Machines

Problem:

The automated “nearline_bluearc_copy.sh” script on mnvnearline1 fails to copy necessary files from local_dump_area to online_processing/swap_area
(from /scratch/nearonline/var/job_dump to /minerva/data/online_processing/swap_area)

Investigations:

  • Investigating mnvnearline1:scripts/nearline_bluearc_copy.sh
  • Script runs automatically every 5 minutes.
  • Log file for the script: /scratch/var/nearonline/logs
  • Local copy from HEAD to following folders WORKS
    • NEARLINE_DUMP_AREA /scratch/nearonline/var/job_dump
    • NEARLINE_LOCAL_GMPLOTTER_LOCATION /scratch/nearonline/var/gmplotter
  • The problem is with the python script “filechecklist.py”
  • It does not generate the file list for files from the following folders:
    • $NEARLINE_DUMP_AREA
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/plotter
  • It works for the following folder
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/www
  • Since there is no file list generated by the python script “filechecklist.py”, NO files copied to the swap_area

Temporary Solution:

  • I modified the script to use rsync command.
    • Now it synchronizes the local_dump_area and online_processing/swap_area
  • Inside the script Jeremy notes that, “using rsync for this stage incurs a lot of overhead on the BlueArc disk”, therefor,  he writes a more efficient script “filechecklist.py” for this task

 Permanent Solution:

  • The Problem is confirmed.
    •  .fileindex under /scratch/nearonline/var/job_dump got corrupted and causing “file checklist.py” to crash for that folder
  • Using rsync manually fixed the .fileindex
  • Software sync between mnvonlinelogger and mnvnearline1 updates the nearline_bluearc_copy.sh script to the original version
  • Now everything works as before. The near ine_bluearc_copy.sh script copies the changed files to bluearc area using “file checklist.py”

 

GMBrowser Problem

  • GMBrowser Live works but shifter can not access to the previous runs and subruns
    • GMBrowser -r xx -s xxx does not work
  • The files are not copied automatically to the /minerva/data/online_processing/swap_area
  • I copied the files manually:
    • Connect to the mnvnearline1 – it has the BlueArc /minerva/ mount
    • Necessary files located: /scratch/nearonline/var/job_dumb
  • This solved the issue for non copied files.
  • I checked the log file for nearline_bluearc_copy.sh script under /scratch/nearonline/var/logs
    • After sometime the auto-script seems to be working.
  • Currently, we are not running any runs, I will check the status again tomorrow

Minerva Software Installed on new CR Machines (ROC-West)

MINERvA Software Installation on minerva-cr-01 and minerva-cr-02

  • Firefox configured for Shifter Bookmarks
  • Special Kerberos Principal Installed
  • ROOT 5.34/21 installed and tested
  • ControlRoomTools Installed and Configured
    • setup.sh file and .k5login file configured with ControlRoomTools
    • GMBrowser installed and tested
  • mnvdaqrunscripts installed and configured
    • registered new hostnames(minerva-cr-01 and minerva-cr-02) into “mnvdaqrunscripts/install.sh”
    • committed and tagged changes as “oaltinok_2014_09_17″
  • mnvruncontrol installed and configured
    •  I tried connecting using run control but could not connected. Will test again tomorrow.
  • MINOS Software is NOT installed

Major Update to RunControl Software (v6r1)

  1. Killed all processes
  2. Jeremy updated the
    • mnvonline0.fnal.gov
    • mnvonline1.fnal.gov
    • mnvonlinelogger.fnal.gov
    • minerva-rc.fnal.gov
  3. I updated the remaining Control Room Computers
    • minerva-evd.fnal.gov
    • minerva-bm.fnal.gov
    • minerva-om-02.fnal.gov
  4. Testing the Updates
    1. Successful Test on Control Room Computers
    2. Successful Test on Rochester UROC
    3. Successful Test on Tufts UROCs
  5. I updated the UROC_sw_manager.py script and notified UROC Users

Fermilab Power Failure

  • On Sunday 03:30 am there was a power failure affecting MINOS and MINERvA underground machines
  • Control Room Computers lost network mount to /minerva/data/
    • GMBrowser needs /minerva/data mounted and it was not working
    • minerva-evd is used by UROCs to mount /minerva/data and they are also affected.
    • Carrie opened a service ticket to ask Computer Division Help for Control Room Computers
    • Computer Division solved the incident and all machines and UROCs working properly.
  • mnvonlinebck1.fnal.gov machine is still down and we have no access to Veto Wall HV Monitoring.
  • e-Checklist can be used either one of the following servers: (minerva-wbm was down due to power glitch)
    • http://minerva-wbm.fnal.gov/minerva/echecklist/mininfo.php
    • http://nusoft.fnal.gov/minerva/echecklist/mininfo.php

minerva-om Update

  • minerva-om no longer support MINERvA Software
  • minerva-om has latest MINOS RMS Installed for om near check
  • .bashrc script modified to prompt users to use the “start_MinosOm.sh” command to start the minos om GUI
  • Nothing removed and every file and software are recoverable.

Validation

  • University of Minnesota Duluth group updating a pretty old UROC
  • They installed all Minerva and MINOS Software and have some problems with the ValidationTools, GMBrowser and RunVetoHVMonitor
  • Other than these 3 everyhing working and they started their shadow shift.

ValidationTools

  • UROC_sw_manager.py fail to “make clean” and “make” package
  • I removed the ValidationTools folder and ran UROC_sw_manager again – It worked!

GMBrowser

  • GMBrowser crashes (not responding)

Attempts

  • Update ControlRoomTools/gmbrowser folder to HEAD
    • Not worked!
  • Remove gmbrowser folder and run UROC_sw_manager.py
    • Not worked!

After these failed attempts, I concluded that the problem seems to be related with ROOT version they are using.

  • Rick installed ROOT 5.34 by building and tried running GMBrowser
    • It worked!

RunVetoHVMonitor

  • Script fail to open GUI
  • Problem is the missing “-x” option from SSH Command, they locally modified the script.
  • I need to modify and upload the correct version to CVS

 

UROC configure_runcontrol.sh Error

Problem

  • When you try to run “source configure_runcontrol.sh”
    • ImportError: No module named wx

Attempts

  1. Tried to import wx module to python for testing
    • python
    • import wx
    • That works! You can import wx into python
  2. Removed and re-installed package: python-wxgtk2.8
    • sudo apt-get remove python-wxgtk2.8
    • sudo apt-get install python-wxgtk2.8
    • Still getting the same error when I try to run the configure_runcontrol.sh
  3. Script: configure_runcontrol.sh runs the python script
    • mnvruncontrol/frontend/RunControlConfiguration.py
    • Running that script directly using “python RunControlConfiguration.py” works!

Tufts UROC Run Control Connection Error

Possible Reason

  • Dead SSH Connections between runcontrol server.

Error

Warning: remote port forwarding failed for listen port 3012
Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 551, in __bootstrap_inner
self.run()
File “/home/minerva/mnvruncontrol/backend/Threads.py”, line 236, in run
method_info["method"](*args, **kwargs)
File “RunControl.py”, line 1413, in ConnectDAQ
success = self.PrepareSSHTunnels(ssh_user=ssh_user, remote_host=remote_host, remote_port=remote_port)
File “RunControl.py”, line 1621, in PrepareSSHTunnels
deliveries = self.postoffice.SendTo(message=test_msg, recipient_list=recipients, timeout=3.0, with_exception=True)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1298, in SendTo
responses = self.SendWithConfirmation(message, timeout, with_exception)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1325, in SendWithConfirmation
raise AlreadyWaitingError(“SendWithConfirmation can’t wait for multiple messages simultaneously.”)
AlreadyWaitingError: SendWithConfirmation can’t wait for multiple messages simultaneously.

Attempts

  1. Restarted the computer
    • It did not worked!
  2. Tried to remove all SSH connections
    • ps -ef | grep ssh
    • kill <pID>
    • There was no idle SSH Connections
  3. Checked Tufts UROC-02 listener port
    • It was 3012 (same as UROC)
    • Changed UROC listener port to 3017
    • It worked!
  • Updated the UROC_User_Details wiki page for the new listener ports