Major Update to RunControl Software (v6r1)

  1. Killed all processes
  2. Jeremy updated the
    • mnvonline0.fnal.gov
    • mnvonline1.fnal.gov
    • mnvonlinelogger.fnal.gov
    • minerva-rc.fnal.gov
  3. I updated the remaining Control Room Computers
    • minerva-evd.fnal.gov
    • minerva-bm.fnal.gov
    • minerva-om-02.fnal.gov
  4. Testing the Updates
    1. Successful Test on Control Room Computers
    2. Successful Test on Rochester UROC
    3. Successful Test on Tufts UROCs
  5. I updated the UROC_sw_manager.py script and notified UROC Users

Fermilab Power Failure

  • On Sunday 03:30 am there was a power failure affecting MINOS and MINERvA underground machines
  • Control Room Computers lost network mount to /minerva/data/
    • GMBrowser needs /minerva/data mounted and it was not working
    • minerva-evd is used by UROCs to mount /minerva/data and they are also affected.
    • Carrie opened a service ticket to ask Computer Division Help for Control Room Computers
    • Computer Division solved the incident and all machines and UROCs working properly.
  • mnvonlinebck1.fnal.gov machine is still down and we have no access to Veto Wall HV Monitoring.
  • e-Checklist can be used either one of the following servers: (minerva-wbm was down due to power glitch)
    • http://minerva-wbm.fnal.gov/minerva/echecklist/mininfo.php
    • http://nusoft.fnal.gov/minerva/echecklist/mininfo.php

minerva-om Update

  • minerva-om no longer support MINERvA Software
  • minerva-om has latest MINOS RMS Installed for om near check
  • .bashrc script modified to prompt users to use the “start_MinosOm.sh” command to start the minos om GUI
  • Nothing removed and every file and software are recoverable.

Validation

  • University of Minnesota Duluth group updating a pretty old UROC
  • They installed all Minerva and MINOS Software and have some problems with the ValidationTools, GMBrowser and RunVetoHVMonitor
  • Other than these 3 everyhing working and they started their shadow shift.

ValidationTools

  • UROC_sw_manager.py fail to “make clean” and “make” package
  • I removed the ValidationTools folder and ran UROC_sw_manager again – It worked!

GMBrowser

  • GMBrowser crashes (not responding)

Attempts

  • Update ControlRoomTools/gmbrowser folder to HEAD
    • Not worked!
  • Remove gmbrowser folder and run UROC_sw_manager.py
    • Not worked!

After these failed attempts, I concluded that the problem seems to be related with ROOT version they are using.

  • Rick installed ROOT 5.34 by building and tried running GMBrowser
    • It worked!

RunVetoHVMonitor

  • Script fail to open GUI
  • Problem is the missing “-x” option from SSH Command, they locally modified the script.
  • I need to modify and upload the correct version to CVS

 

UROC configure_runcontrol.sh Error

Problem

  • When you try to run “source configure_runcontrol.sh”
    • ImportError: No module named wx

Attempts

  1. Tried to import wx module to python for testing
    • python
    • import wx
    • That works! You can import wx into python
  2. Removed and re-installed package: python-wxgtk2.8
    • sudo apt-get remove python-wxgtk2.8
    • sudo apt-get install python-wxgtk2.8
    • Still getting the same error when I try to run the configure_runcontrol.sh
  3. Script: configure_runcontrol.sh runs the python script
    • mnvruncontrol/frontend/RunControlConfiguration.py
    • Running that script directly using “python RunControlConfiguration.py” works!

Tufts UROC Run Control Connection Error

Possible Reason

  • Dead SSH Connections between runcontrol server.

Error

Warning: remote port forwarding failed for listen port 3012
Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 551, in __bootstrap_inner
self.run()
File “/home/minerva/mnvruncontrol/backend/Threads.py”, line 236, in run
method_info["method"](*args, **kwargs)
File “RunControl.py”, line 1413, in ConnectDAQ
success = self.PrepareSSHTunnels(ssh_user=ssh_user, remote_host=remote_host, remote_port=remote_port)
File “RunControl.py”, line 1621, in PrepareSSHTunnels
deliveries = self.postoffice.SendTo(message=test_msg, recipient_list=recipients, timeout=3.0, with_exception=True)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1298, in SendTo
responses = self.SendWithConfirmation(message, timeout, with_exception)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1325, in SendWithConfirmation
raise AlreadyWaitingError(“SendWithConfirmation can’t wait for multiple messages simultaneously.”)
AlreadyWaitingError: SendWithConfirmation can’t wait for multiple messages simultaneously.

Attempts

  1. Restarted the computer
    • It did not worked!
  2. Tried to remove all SSH connections
    • ps -ef | grep ssh
    • kill <pID>
    • There was no idle SSH Connections
  3. Checked Tufts UROC-02 listener port
    • It was 3012 (same as UROC)
    • Changed UROC listener port to 3017
    • It worked!
  • Updated the UROC_User_Details wiki page for the new listener ports

Nearonline Dispatcher Crash

  • Yesterday around 15:00, nearonline dispatcher crashed causing no raw data under: /scratch/nearonline/var/job_dump/
  • Manual Job Submission via manual_dst_submit.py did not worked. (No Raw Data)
  • Jeremy manually copied the raw data with the following command and manually submitted the missing jobs:
    • cd /scratch/nearonline/var/job_dump/
    • for ((i=22;i<=32;i++)); do cp /mnvonline0/work/data/rawdata/MV_00010504_00${i}_*_RawData.dat .; done

UROC Update

  • Carrie had to do an emergency computer swap underground this afternoon.
  • We are no longer using mnvonline1 to read out the DAQ, we are using mnvonline0
  • UROCs must connect to  mnvonline1.fnal.gov instead of mnvonline0.fnal.gov
  • Carrie updated “mnvdaqrunscripts/def_mnvonline” and tagged it as carrie_2014_07_03
  • Updated UROC_sw_manager.py and informed all UROC users to update their UROCs.
    • Important Notice: After the update users need to remove the following folder
      • mnvruncontrol/backend/PostOffice

IP Change Problem

  • Message from Nathaniel indicating a problem with their networking infrastructure
    • “Reverse-IP lookup on our computer now returns a completely different IP address.  I have no idea what’s causing it.”
    • “SSH-ing directly into our nominal 205.133.226.69 address works fine… but our outbound connections look like they’re coming from 199.18.112.252!”
  • Otterbein University IT Department investigating the incident

UROC: Otterbein

  • RunControl Version Problem fixed by using older tag: “oaltinok_2014_04_11”
  • Updated “UROC_User_Details” for Otterbein University

Connection Problem

  • Otterbein UROC had problems connecting to all of the mnvonline machines.
  • Jeremy and I investigated the problem and find out the following
    • The Otterbein UROC hostname(photon.otterbein.edu) was not resolved from their Static IP address(205.133.226.69)
    • We added their Static IP address in addition to their hostname to the etc/hosts.allow
    • I suspect it is because they are using Ubuntu 14.04
  • Problem resolved

UROC Documentation

  • UROC information and documentation updated to an Online Version under Minerva Wiki
    • OLD MINERvA Document 5682
    • NEW https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Setup_New_UROC
  • Plan is to update all UROC related documentation to Online

UROC: Otterbein

  • Modified UROC_user_details.xls
  • Included Special Kerberos Principal on required machines
  • E-mailed Carrie for modification of etc/hosts.allow
  • New Service ticket for ACL modification
  • mnvdaqrunscripts/install.sh modified with Otterbein hostname
  • mnvruncontrol/port_assignments.txt assigned new poart for Otterbein
  • UROC_sw_Manager.py Updated but it caused a problem due to a conflict on latest version of mnvruncontrol
  • Contacted with Jeremy, he will commit the lastest working version of runcontrol and let me know.