USM UROC mnvonline03 Connection Problem

Problem

  • USM UROC, cannot connect mnvonline03 via RunControl GUI

Troubleshooting and Solution

  • Tried connecting manually with ssh command
    • Same result
  • Error: No Route to Host even they can ping or traceroute the server
  • Contacted Fermilab Service Desk and opened a new ticket as incident
  • Problem solved once Network Services updated the ACL router to accept hostname as
    • 200.1.20.169…nat-sj-labinf2.campus.utfsm.cl
  • This is a temporary solution, we need to find a way to generate constant hostname everytime

Oregon State UROC Authorization

  • Updated User Details
  • Updated
    • mnvdaqrunscripts/install.sh
    • mnvruncontrol/port_assignments.txt
    • UROC_sw_manager.py
  • Modified
    • .k5login
    • /etc/hosts.allow
      • mnvonline03 was allowing all hosts – asked Carrie
  • Opened a Service Desk Ticket for MINERvA Router Modification
  • Authorization requested from Stefano for MINOS Gateway Machines

Pontificia MINOS: “Permission Denied”

Problem

  • Pontificia UROC having “Permission Denied” Error when they run the commands
    • ~/opt/rms/rms service rc near
    • ~/opt/rms/rms service om near

Attempt 1

  • I suggested them to use the  “kinit” command
    • ~/opt/rms/rms kinit
  • They were already tried kinit and did not worked

Solution

  • I connected to the “minos@minos-gateway-nd.fnal.gov” and checked the “.k5login” file.
  • Their principal listed in the list but in the wrong format
    • Listed as: “Principal”
    • Correct Format: “Principal@FNAL.GOV”
  • e-mailed Stefano and other MINOS people. Stefano modified the file.

Major Update to RunControl Software (v6r1)

  1. Killed all processes
  2. Jeremy updated the
    • mnvonline0.fnal.gov
    • mnvonline1.fnal.gov
    • mnvonlinelogger.fnal.gov
    • minerva-rc.fnal.gov
  3. I updated the remaining Control Room Computers
    • minerva-evd.fnal.gov
    • minerva-bm.fnal.gov
    • minerva-om-02.fnal.gov
  4. Testing the Updates
    1. Successful Test on Control Room Computers
    2. Successful Test on Rochester UROC
    3. Successful Test on Tufts UROCs
  5. I updated the UROC_sw_manager.py script and notified UROC Users

Validation

  • University of Minnesota Duluth group updating a pretty old UROC
  • They installed all Minerva and MINOS Software and have some problems with the ValidationTools, GMBrowser and RunVetoHVMonitor
  • Other than these 3 everyhing working and they started their shadow shift.

ValidationTools

  • UROC_sw_manager.py fail to “make clean” and “make” package
  • I removed the ValidationTools folder and ran UROC_sw_manager again – It worked!

GMBrowser

  • GMBrowser crashes (not responding)

Attempts

  • Update ControlRoomTools/gmbrowser folder to HEAD
    • Not worked!
  • Remove gmbrowser folder and run UROC_sw_manager.py
    • Not worked!

After these failed attempts, I concluded that the problem seems to be related with ROOT version they are using.

  • Rick installed ROOT 5.34 by building and tried running GMBrowser
    • It worked!

RunVetoHVMonitor

  • Script fail to open GUI
  • Problem is the missing “-x” option from SSH Command, they locally modified the script.
  • I need to modify and upload the correct version to CVS

 

UROC configure_runcontrol.sh Error

Problem

  • When you try to run “source configure_runcontrol.sh”
    • ImportError: No module named wx

Attempts

  1. Tried to import wx module to python for testing
    • python
    • import wx
    • That works! You can import wx into python
  2. Removed and re-installed package: python-wxgtk2.8
    • sudo apt-get remove python-wxgtk2.8
    • sudo apt-get install python-wxgtk2.8
    • Still getting the same error when I try to run the configure_runcontrol.sh
  3. Script: configure_runcontrol.sh runs the python script
    • mnvruncontrol/frontend/RunControlConfiguration.py
    • Running that script directly using “python RunControlConfiguration.py” works!

Tufts UROC Run Control Connection Error

Possible Reason

  • Dead SSH Connections between runcontrol server.

Error

Warning: remote port forwarding failed for listen port 3012
Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 551, in __bootstrap_inner
self.run()
File “/home/minerva/mnvruncontrol/backend/Threads.py”, line 236, in run
method_info[“method”](*args, **kwargs)
File “RunControl.py”, line 1413, in ConnectDAQ
success = self.PrepareSSHTunnels(ssh_user=ssh_user, remote_host=remote_host, remote_port=remote_port)
File “RunControl.py”, line 1621, in PrepareSSHTunnels
deliveries = self.postoffice.SendTo(message=test_msg, recipient_list=recipients, timeout=3.0, with_exception=True)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1298, in SendTo
responses = self.SendWithConfirmation(message, timeout, with_exception)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1325, in SendWithConfirmation
raise AlreadyWaitingError(“SendWithConfirmation can’t wait for multiple messages simultaneously.”)
AlreadyWaitingError: SendWithConfirmation can’t wait for multiple messages simultaneously.

Attempts

  1. Restarted the computer
    • It did not worked!
  2. Tried to remove all SSH connections
    • ps -ef | grep ssh
    • kill <pID>
    • There was no idle SSH Connections
  3. Checked Tufts UROC-02 listener port
    • It was 3012 (same as UROC)
    • Changed UROC listener port to 3017
    • It worked!
  • Updated the UROC_User_Details wiki page for the new listener ports

UROC Update

  • Carrie had to do an emergency computer swap underground this afternoon.
  • We are no longer using mnvonline1 to read out the DAQ, we are using mnvonline0
  • UROCs must connect to  mnvonline1.fnal.gov instead of mnvonline0.fnal.gov
  • Carrie updated “mnvdaqrunscripts/def_mnvonline” and tagged it as carrie_2014_07_03
  • Updated UROC_sw_manager.py and informed all UROC users to update their UROCs.
    • Important Notice: After the update users need to remove the following folder
      • mnvruncontrol/backend/PostOffice

IP Change Problem

  • Message from Nathaniel indicating a problem with their networking infrastructure
    • “Reverse-IP lookup on our computer now returns a completely different IP address.  I have no idea what’s causing it.”
    • “SSH-ing directly into our nominal 205.133.226.69 address works fine… but our outbound connections look like they’re coming from 199.18.112.252!”
  • Otterbein University IT Department investigating the incident

UROC: Otterbein

  • RunControl Version Problem fixed by using older tag: “oaltinok_2014_04_11”
  • Updated “UROC_User_Details” for Otterbein University

Connection Problem

  • Otterbein UROC had problems connecting to all of the mnvonline machines.
  • Jeremy and I investigated the problem and find out the following
    • The Otterbein UROC hostname(photon.otterbein.edu) was not resolved from their Static IP address(205.133.226.69)
    • We added their Static IP address in addition to their hostname to the etc/hosts.allow
    • I suspect it is because they are using Ubuntu 14.04
  • Problem resolved

UROC Documentation

  • UROC information and documentation updated to an Online Version under Minerva Wiki
    • OLD MINERvA Document 5682
    • NEW https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Setup_New_UROC
  • Plan is to update all UROC related documentation to Online