USM UROC mnvonline03 Connection Problem

Problem

  • USM UROC, cannot connect mnvonline03 via RunControl GUI

Troubleshooting and Solution

  • Tried connecting manually with ssh command
    • Same result
  • Error: No Route to Host even they can ping or traceroute the server
  • Contacted Fermilab Service Desk and opened a new ticket as incident
  • Problem solved once Network Services updated the ACL router to accept hostname as
    • 200.1.20.169…nat-sj-labinf2.campus.utfsm.cl
  • This is a temporary solution, we need to find a way to generate constant hostname everytime

Zero POT for all Gates in the Event after beam downtime

  • After the Beam Come back on 04/22/2015 the POT Plots on eChecklist shows zero POT for all gates in the event
  • Checked Nearline System and the Processed Raw Data File Located:
    • /minerva/data/online_processing/swap_area/
  • The Corresponding Histogram is the EventPOT
    • EventPOT Histogram is filled with ZERO POT for each gate
  • It concludes that the problem is NOT on our side.
  • Nova also experience the zero POT plots
  • Waiting for an update about the situation

GMBrowser Problem after Power Outage

  • GMBrowser was not working on Control Room Computers
  • The Problem was tbe following:
    • /grid/fermiapp/ disk was not accesible from minerva-cr-01 and minerva-cr-02
  • Opened a Service Desk Ticket and the Fermilab CD mounted the disks
  • After the mount, GMBrowser did not worked with the Local ROOT installation
    • I modified the setup.min.soft.sh script to use the ROOT under /grid/fermiapp
    • Built GMBrowser using new ROOT and it started to work

 

Nearline Software Update: torretta_2015_04_06

  • Stopped Run: 16664/36
  • Tagged Tools/DaqRecv HEAD version as  “torretta_2015_04_06″
  • Updated the following package on mnvonlinelogger
    • cmtuser/current/Tools/DaqRecv with tag “torretta_2015_04_06″
  • Built updated package on mnvonlinelogger without any warnings or errors.
  • Software Sync to get the update on
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Started new Run: 16665

eChecklist Problem – copy_nearline_logs_ecklist

Problem

  • Carrie has written a new script  copy_nearline_logs_ecklist which runs on mnvonline03 every 3 mins.
    • Script basically copies all the files modified in last 1.25h to the mnvonlinelogger
  • Another script on mnvonlinelogger “mnvchklastev_mnvonline03.py” updates the eChecklist.
    • That script responsible for marking runs as “Finalizing”  or “Finished”
  • mnvchklastev_mnvonline03.py script marks a subrun “Finished” and applies “Auto Data Quality” if the file is not modified in the last 300 seconds.
  • The problem was: copy_nearline_logs_ecklist was copying the files that are modified in last 1.25h again and again in every 3 minutes. (This is the reason the files marked as “Finished” after 1.25h + 300 seconds later. )

Solution

  • I modified copy_nearline_logs_ecklist to copy the files modified in last 5 min.
    •  Since we are copying files every 3 min, 5 min is enough, we do not need 1.25h
  • Tested new script for 2 h and it works fine

Oregon State UROC Authorization

  • Updated User Details
  • Updated
    • mnvdaqrunscripts/install.sh
    • mnvruncontrol/port_assignments.txt
    • UROC_sw_manager.py
  • Modified
    • .k5login
    • /etc/hosts.allow
      • mnvonline03 was allowing all hosts – asked Carrie
  • Opened a Service Desk Ticket for MINERvA Router Modification
  • Authorization requested from Stefano for MINOS Gateway Machines

Nearline Software Update: toretta_2015_02_27

  • Stopped Run: 16578/5
  • Tagged Tools/DaqRecv HEAD version as  “toretta_2015_02_27
  • Updated the following package on mnvonlinelogger
    • cmtuser/current/Tools/DaqRecv with tag “toretta_2015_02_27
  • Built updated package on mnvonlinelogger without any warnings or errors.
  • Software Sync to get the update on
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Started new Run: 16579

mnvnearline{1,2,3,4} Kernel Updates

  • Ed Simmonds completed Kernel Updates on mnvnearline{1,2,3,4}.
  • Used beam downtime to our advantage and stopped runcontrol during the updates.
  • All subruns right before downtime have all gates processed:
    • 16566/19
    • 16566/20
    • 16566/21
    • 16566/22 –> 24 Gates only (I stopped run on that subrun)
  • There was no failed jobs and no need for manual job submission.
  • I checked the next subrun(16567/1) via e-Checklist and it was OK

Documentation Update

  • Added two new documentation under Minerva OPS wiki
  • Update Nearline Software
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Update_Nearline_Software
  • Install a NEW Frameowk Version on Nearline System
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Install_a_NEW_Framework_Version_on_Nearline_System

 

e-Checklist nusoft works again

  • Backup e-Checklist: http://nusoft.fnal.gov/minerva/echecklist/mininfo.php 
  • Modified scripts 
    • setup_nearline_software.sh 
    • nearline_bluearc_copy.sh 
  • nusoft uses NEARLINE_BLUEARC_GMPLOTTER_AREA, which needs to be under “/minerva/app” NOT under “/minerva/data”
    • NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/app/users/nearonline/gmbrowser
  • Changes committed to CVS.
  • Manually synchronized nearline1 with the mnvonlinelogger