eChecklist Problem – copy_nearline_logs_ecklist

Problem

  • Carrie has written a new script  copy_nearline_logs_ecklist which runs on mnvonline03 every 3 mins.
    • Script basically copies all the files modified in last 1.25h to the mnvonlinelogger
  • Another script on mnvonlinelogger “mnvchklastev_mnvonline03.py” updates the eChecklist.
    • That script responsible for marking runs as “Finalizing”  or “Finished”
  • mnvchklastev_mnvonline03.py script marks a subrun “Finished” and applies “Auto Data Quality” if the file is not modified in the last 300 seconds.
  • The problem was: copy_nearline_logs_ecklist was copying the files that are modified in last 1.25h again and again in every 3 minutes. (This is the reason the files marked as “Finished” after 1.25h + 300 seconds later. )

Solution

  • I modified copy_nearline_logs_ecklist to copy the files modified in last 5 min.
    •  Since we are copying files every 3 min, 5 min is enough, we do not need 1.25h
  • Tested new script for 2 h and it works fine

Oregon State UROC Authorization

  • Updated User Details
  • Updated
    • mnvdaqrunscripts/install.sh
    • mnvruncontrol/port_assignments.txt
    • UROC_sw_manager.py
  • Modified
    • .k5login
    • /etc/hosts.allow
      • mnvonline03 was allowing all hosts – asked Carrie
  • Opened a Service Desk Ticket for MINERvA Router Modification
  • Authorization requested from Stefano for MINOS Gateway Machines

Nearline Software Update: toretta_2015_02_27

  • Stopped Run: 16578/5
  • Tagged Tools/DaqRecv HEAD version as  “toretta_2015_02_27
  • Updated the following package on mnvonlinelogger
    • cmtuser/current/Tools/DaqRecv with tag “toretta_2015_02_27
  • Built updated package on mnvonlinelogger without any warnings or errors.
  • Software Sync to get the update on
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Started new Run: 16579

mnvnearline{1,2,3,4} Kernel Updates

  • Ed Simmonds completed Kernel Updates on mnvnearline{1,2,3,4}.
  • Used beam downtime to our advantage and stopped runcontrol during the updates.
  • All subruns right before downtime have all gates processed:
    • 16566/19
    • 16566/20
    • 16566/21
    • 16566/22 –> 24 Gates only (I stopped run on that subrun)
  • There was no failed jobs and no need for manual job submission.
  • I checked the next subrun(16567/1) via e-Checklist and it was OK

Documentation Update

  • Added two new documentation under Minerva OPS wiki
  • Update Nearline Software
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Update_Nearline_Software
  • Install a NEW Frameowk Version on Nearline System
    • https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Install_a_NEW_Framework_Version_on_Nearline_System

 

e-Checklist nusoft works again

  • Backup e-Checklist: http://nusoft.fnal.gov/minerva/echecklist/mininfo.php 
  • Modified scripts 
    • setup_nearline_software.sh 
    • nearline_bluearc_copy.sh 
  • nusoft uses NEARLINE_BLUEARC_GMPLOTTER_AREA, which needs to be under “/minerva/app” NOT under “/minerva/data”
    • NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/app/users/nearonline/gmbrowser
  • Changes committed to CVS.
  • Manually synchronized nearline1 with the mnvonlinelogger

Pontificia MINOS: “Permission Denied”

Problem

  • Pontificia UROC having “Permission Denied” Error when they run the commands
    • ~/opt/rms/rms service rc near
    • ~/opt/rms/rms service om near

Attempt 1

  • I suggested them to use the  “kinit” command
    • ~/opt/rms/rms kinit
  • They were already tried kinit and did not worked

Solution

  • I connected to the “minos@minos-gateway-nd.fnal.gov” and checked the “.k5login” file.
  • Their principal listed in the list but in the wrong format
    • Listed as: “Principal”
    • Correct Format: “Principal@FNAL.GOV”
  • e-mailed Stefano and other MINOS people. Stefano modified the file.

GMBrowser Update – Uses all gates now

  • Previously GMBrowser that shifters look at only uses a fraction of the gates, because the early processing stages (particularly DecodeRawEvent) were slow.
  • Now that we have a faster version of DecodeRawEvent, and we modified GMBrowser to use all gates
  • Modified following parameters in NearlineCurrent.opts in Tools/DaqRecv/options on mnvonlinelogger, to be 100 percent:
    • PdstlPrescaler.PercentPass          = 25;
    • LinjcPrescaler.PercentPass          = 25;
    • NumibPrescaler.PercentPass       = 20;
  • Ran the “nearline_software_sync.sh” script in all Nearline Machines to get the update
    • mnvnearline1
    • mnvnearline2
    • mnvnearline3
    • mnvnearline4
  • Informed Current Shifter about the update and started GMBrowser at Tufts UROC
    • Will investigate the behavior for some time, until we make this change permanent.

Nearline File Management Problems

We still have problems for nearline file management and I listed the ones I found. Here is the list of folders need to be managed.

  1. Synchronize /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/
  2. Synchronize /scratch/nearonline/var/gmplotter/plotter/ with /minerva/data/users/nearonline/gmbrowser/plotter/
  3. Synchronize /scratch/nearonline/var/gmplotter/www/ with /minerva/data/users/nearonline/gmbrowser/www/
  4. Copy Files from /scratch/nearonline/var/gmplotter/www to minerva@minerva-wbm.fnal.gov:/opt/if-wbm/htdoc/minerva/echecklist/gmb_hists

Here is the status of each section

1) I modified the script to use rsync command to sync between /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/ For now, we have a stable synchronization between two folders however,this method copies the .log files also, which is unnecessary.

2,3) No USER “nearonline” under /minerva/data/users Setup script assigns the following export NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/data/users/nearonline/gmbrowser There is a “nearonline” user under /minerva/app, however we should not copy any data file to the /minerva/app.

4) e-Checklist works, I conclude this section works. I did not checked the details.

We should organize a plan to solve all the problems in nearline file management. I propose the following,

  • Lets use rsync command for 1,2,3
  • We need to create a folder “/minerva/data/users/nearonline” and let other systems know where we are copying the files.
  • If there is a folder I forgot to sync between nearline and bluearc, that folder also needs to be added to the script.

Control Room Computer Update to v10r9p1

  • Updated minerva-cr-01 and minerva-cr-02 to v10r9p1
  • Modified setupFiles for initiating setup with v10r9p1
    • tempSetup.sh
    • .bashrc
  • Modified setup.min.soft.sh for v10r9p1
    • Python was a problem for v10r9p1 setup script
    • Created an alias to use the local version python instead of Framework Version
      • alias python=”/usr/bin/python”
  • Tested and Tagged ControlRoomTools under v10r9p1 as “stable_v10r9p1″
  • Updated the .bashrc for using the setup file under v10r9p1
  • Linked .k5login with “cmtuser/Minerva_v10r9p1/Tools/ControlRoomTools/authenticate/k5login-master”
  • Updated the documentation on wiki: “Control_Room_Setup_Manual”

Software Update

  • mnvonlinelogger updated
  • Slave Nodes will receive update automatically
    • Updated Packages under cmtuser area
      • Tools/DaqRecv [croce_v3]
        • cvs co -r croce_v3 Tools/DaqRecv
    • Installed Packages under cmtuser area
      • Event/MinervaKernel [croce_v3]
        • This package required for Event/MinervaEvent
        • getpack -u Event/MinervaKernel
      • Event/MinervaEvent [croce_v3]
        • cvs co -r croce_v3 Event/MinervaEvent
    • Built All Packages in the following order
      1. Tools/DaqRecv
      2. Event/MinervaKernel
      3. Event/MinervaEvent
    • Building Commands:
      • cmt config
      • cmt make
      • source setup.sh

Problem after restarting Nearline Machines

Problem:

The automated “nearline_bluearc_copy.sh” script on mnvnearline1 fails to copy necessary files from local_dump_area to online_processing/swap_area
(from /scratch/nearonline/var/job_dump to /minerva/data/online_processing/swap_area)

Investigations:

  • Investigating mnvnearline1:scripts/nearline_bluearc_copy.sh
  • Script runs automatically every 5 minutes.
  • Log file for the script: /scratch/var/nearonline/logs
  • Local copy from HEAD to following folders WORKS
    • NEARLINE_DUMP_AREA /scratch/nearonline/var/job_dump
    • NEARLINE_LOCAL_GMPLOTTER_LOCATION /scratch/nearonline/var/gmplotter
  • The problem is with the python script “filechecklist.py”
  • It does not generate the file list for files from the following folders:
    • $NEARLINE_DUMP_AREA
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/plotter
  • It works for the following folder
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/www
  • Since there is no file list generated by the python script “filechecklist.py”, NO files copied to the swap_area

Temporary Solution:

  • I modified the script to use rsync command.
    • Now it synchronizes the local_dump_area and online_processing/swap_area
  • Inside the script Jeremy notes that, “using rsync for this stage incurs a lot of overhead on the BlueArc disk”, therefor,  he writes a more efficient script “filechecklist.py” for this task

 Permanent Solution:

  • The Problem is confirmed.
    •  .fileindex under /scratch/nearonline/var/job_dump got corrupted and causing “file checklist.py” to crash for that folder
  • Using rsync manually fixed the .fileindex
  • Software sync between mnvonlinelogger and mnvnearline1 updates the nearline_bluearc_copy.sh script to the original version
  • Now everything works as before. The near ine_bluearc_copy.sh script copies the changed files to bluearc area using “file checklist.py”