USM UROC mnvonline03 Connection Problem

Problem

  • USM UROC, cannot connect mnvonline03 via RunControl GUI

Troubleshooting and Solution

  • Tried connecting manually with ssh command
    • Same result
  • Error: No Route to Host even they can ping or traceroute the server
  • Contacted Fermilab Service Desk and opened a new ticket as incident
  • Problem solved once Network Services updated the ACL router to accept hostname as
    • 200.1.20.169…nat-sj-labinf2.campus.utfsm.cl
  • This is a temporary solution, we need to find a way to generate constant hostname everytime

Zero POT for all Gates in the Event after beam downtime

  • After the Beam Come back on 04/22/2015 the POT Plots on eChecklist shows zero POT for all gates in the event
  • Checked Nearline System and the Processed Raw Data File Located:
    • /minerva/data/online_processing/swap_area/
  • The Corresponding Histogram is the EventPOT
    • EventPOT Histogram is filled with ZERO POT for each gate
  • It concludes that the problem is NOT on our side.
  • Nova also experience the zero POT plots
  • Waiting for an update about the situation

GMBrowser Problem after Power Outage

  • GMBrowser was not working on Control Room Computers
  • The Problem was tbe following:
    • /grid/fermiapp/ disk was not accesible from minerva-cr-01 and minerva-cr-02
  • Opened a Service Desk Ticket and the Fermilab CD mounted the disks
  • After the mount, GMBrowser did not worked with the Local ROOT installation
    • I modified the setup.min.soft.sh script to use the ROOT under /grid/fermiapp
    • Built GMBrowser using new ROOT and it started to work

 

Nearline Software Update: torretta_2015_04_06

  • Stopped Run: 16664/36
  • Tagged Tools/DaqRecv HEAD version as  “torretta_2015_04_06″
  • Updated the following package on mnvonlinelogger
    • cmtuser/current/Tools/DaqRecv with tag “torretta_2015_04_06″
  • Built updated package on mnvonlinelogger without any warnings or errors.
  • Software Sync to get the update on
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Started new Run: 16665

eChecklist Problem – copy_nearline_logs_ecklist

Problem

  • Carrie has written a new script  copy_nearline_logs_ecklist which runs on mnvonline03 every 3 mins.
    • Script basically copies all the files modified in last 1.25h to the mnvonlinelogger
  • Another script on mnvonlinelogger “mnvchklastev_mnvonline03.py” updates the eChecklist.
    • That script responsible for marking runs as “Finalizing”  or “Finished”
  • mnvchklastev_mnvonline03.py script marks a subrun “Finished” and applies “Auto Data Quality” if the file is not modified in the last 300 seconds.
  • The problem was: copy_nearline_logs_ecklist was copying the files that are modified in last 1.25h again and again in every 3 minutes. (This is the reason the files marked as “Finished” after 1.25h + 300 seconds later. )

Solution

  • I modified copy_nearline_logs_ecklist to copy the files modified in last 5 min.
    •  Since we are copying files every 3 min, 5 min is enough, we do not need 1.25h
  • Tested new script for 2 h and it works fine

Oregon State UROC Authorization

  • Updated User Details
  • Updated
    • mnvdaqrunscripts/install.sh
    • mnvruncontrol/port_assignments.txt
    • UROC_sw_manager.py
  • Modified
    • .k5login
    • /etc/hosts.allow
      • mnvonline03 was allowing all hosts – asked Carrie
  • Opened a Service Desk Ticket for MINERvA Router Modification
  • Authorization requested from Stefano for MINOS Gateway Machines

Nearline Software Update: toretta_2015_02_27

  • Stopped Run: 16578/5
  • Tagged Tools/DaqRecv HEAD version as  “toretta_2015_02_27
  • Updated the following package on mnvonlinelogger
    • cmtuser/current/Tools/DaqRecv with tag “toretta_2015_02_27
  • Built updated package on mnvonlinelogger without any warnings or errors.
  • Software Sync to get the update on
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Started new Run: 16579

mnvnearline{1,2,3,4} Kernel Updates

  • Ed Simmonds completed Kernel Updates on mnvnearline{1,2,3,4}.
  • Used beam downtime to our advantage and stopped runcontrol during the updates.
  • All subruns right before downtime have all gates processed:
    • 16566/19
    • 16566/20
    • 16566/21
    • 16566/22 –> 24 Gates only (I stopped run on that subrun)
  • There was no failed jobs and no need for manual job submission.
  • I checked the next subrun(16567/1) via e-Checklist and it was OK

Documentation Update

  • Added two new documentation under Minerva OPS wiki
  • Update Nearline Software
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Update_Nearline_Software
  • Install a NEW Frameowk Version on Nearline System
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Install_a_NEW_Framework_Version_on_Nearline_System

 

e-Checklist nusoft works again

  • Backup e-Checklist: http://nusoft.fnal.gov/minerva/echecklist/mininfo.php 
  • Modified scripts 
    • setup_nearline_software.sh 
    • nearline_bluearc_copy.sh 
  • nusoft uses NEARLINE_BLUEARC_GMPLOTTER_AREA, which needs to be under “/minerva/app” NOT under “/minerva/data”
    • NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/app/users/nearonline/gmbrowser
  • Changes committed to CVS.
  • Manually synchronized nearline1 with the mnvonlinelogger

Pontificia MINOS: “Permission Denied”

Problem

  • Pontificia UROC having “Permission Denied” Error when they run the commands
    • ~/opt/rms/rms service rc near
    • ~/opt/rms/rms service om near

Attempt 1

  • I suggested them to use the  “kinit” command
    • ~/opt/rms/rms kinit
  • They were already tried kinit and did not worked

Solution

  • I connected to the “minos@minos-gateway-nd.fnal.gov” and checked the “.k5login” file.
  • Their principal listed in the list but in the wrong format
    • Listed as: “Principal”
    • Correct Format: “Principal@FNAL.GOV”
  • e-mailed Stefano and other MINOS people. Stefano modified the file.

GMBrowser Update – Uses all gates now

  • Previously GMBrowser that shifters look at only uses a fraction of the gates, because the early processing stages (particularly DecodeRawEvent) were slow.
  • Now that we have a faster version of DecodeRawEvent, and we modified GMBrowser to use all gates
  • Modified following parameters in NearlineCurrent.opts in Tools/DaqRecv/options on mnvonlinelogger, to be 100 percent:
    • PdstlPrescaler.PercentPass          = 25;
    • LinjcPrescaler.PercentPass          = 25;
    • NumibPrescaler.PercentPass       = 20;
  • Ran the “nearline_software_sync.sh” script in all Nearline Machines to get the update
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Informed Current Shifter about the update and started GMBrowser at Tufts UROC
    • Will investigate the behavior for some time, until we make this change permanent.