Problem after restarting Nearline Machines

Problem:

The automated “nearline_bluearc_copy.sh” script on mnvnearline1 fails to copy necessary files from local_dump_area to online_processing/swap_area
(from /scratch/nearonline/var/job_dump to /minerva/data/online_processing/swap_area)

Investigations:

  • Investigating mnvnearline1:scripts/nearline_bluearc_copy.sh
  • Script runs automatically every 5 minutes.
  • Log file for the script: /scratch/var/nearonline/logs
  • Local copy from HEAD to following folders WORKS
    • NEARLINE_DUMP_AREA /scratch/nearonline/var/job_dump
    • NEARLINE_LOCAL_GMPLOTTER_LOCATION /scratch/nearonline/var/gmplotter
  • The problem is with the python script “filechecklist.py”
  • It does not generate the file list for files from the following folders:
    • $NEARLINE_DUMP_AREA
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/plotter
  • It works for the following folder
    • $NEARLINE_LOCAL_GMPLOTTER_LOCATION/www
  • Since there is no file list generated by the python script “filechecklist.py”, NO files copied to the swap_area

Temporary Solution:

  • I modified the script to use rsync command.
    • Now it synchronizes the local_dump_area and online_processing/swap_area
  • Inside the script Jeremy notes that, “using rsync for this stage incurs a lot of overhead on the BlueArc disk”, therefor,  he writes a more efficient script “filechecklist.py” for this task

 Permanent Solution:

  • The Problem is confirmed.
    •  .fileindex under /scratch/nearonline/var/job_dump got corrupted and causing “file checklist.py” to crash for that folder
  • Using rsync manually fixed the .fileindex
  • Software sync between mnvonlinelogger and mnvnearline1 updates the nearline_bluearc_copy.sh script to the original version
  • Now everything works as before. The near ine_bluearc_copy.sh script copies the changed files to bluearc area using “file checklist.py”

 

GMBrowser Problem

  • GMBrowser Live works but shifter can not access to the previous runs and subruns
    • GMBrowser -r xx -s xxx does not work
  • The files are not copied automatically to the /minerva/data/online_processing/swap_area
  • I copied the files manually:
    • Connect to the mnvnearline1 – it has the BlueArc /minerva/ mount
    • Necessary files located: /scratch/nearonline/var/job_dumb
  • This solved the issue for non copied files.
  • I checked the log file for nearline_bluearc_copy.sh script under /scratch/nearonline/var/logs
    • After sometime the auto-script seems to be working.
  • Currently, we are not running any runs, I will check the status again tomorrow

Minerva Software Installed on new CR Machines (ROC-West)

MINERvA Software Installation on minerva-cr-01 and minerva-cr-02

  • Firefox configured for Shifter Bookmarks
  • Special Kerberos Principal Installed
  • ROOT 5.34/21 installed and tested
  • ControlRoomTools Installed and Configured
    • setup.sh file and .k5login file configured with ControlRoomTools
    • GMBrowser installed and tested
  • mnvdaqrunscripts installed and configured
    • registered new hostnames(minerva-cr-01 and minerva-cr-02) into “mnvdaqrunscripts/install.sh”
    • committed and tagged changes as “oaltinok_2014_09_17″
  • mnvruncontrol installed and configured
    •  I tried connecting using run control but could not connected. Will test again tomorrow.
  • MINOS Software is NOT installed

Major Update to RunControl Software (v6r1)

  1. Killed all processes
  2. Jeremy updated the
    • mnvonline0.fnal.gov
    • mnvonline1.fnal.gov
    • mnvonlinelogger.fnal.gov
    • minerva-rc.fnal.gov
  3. I updated the remaining Control Room Computers
    • minerva-evd.fnal.gov
    • minerva-bm.fnal.gov
    • minerva-om-02.fnal.gov
  4. Testing the Updates
    1. Successful Test on Control Room Computers
    2. Successful Test on Rochester UROC
    3. Successful Test on Tufts UROCs
  5. I updated the UROC_sw_manager.py script and notified UROC Users

v1_05

  • Options file Ana_CCProtonPi0.opts Improved
    • More Control for CCProtonPi0 Flow
  • Debugging Messages from older versions removed
  • New NTuple Data Added to track all of the CUTs
  • New Function: setVertexData()
  • Class: AngleScan
    • Styling Modified to match Package Styling
    • Comments Added
  • Class: ClusterVectorInfo
    • Styling Modified to match Package Styling
    • Comments Added
  • Prong Colors Added for Scanning Sessions
  • New Documentation: ProcessAna_Scripts.txt

Fermilab Power Failure

  • On Sunday 03:30 am there was a power failure affecting MINOS and MINERvA underground machines
  • Control Room Computers lost network mount to /minerva/data/
    • GMBrowser needs /minerva/data mounted and it was not working
    • minerva-evd is used by UROCs to mount /minerva/data and they are also affected.
    • Carrie opened a service ticket to ask Computer Division Help for Control Room Computers
    • Computer Division solved the incident and all machines and UROCs working properly.
  • mnvonlinebck1.fnal.gov machine is still down and we have no access to Veto Wall HV Monitoring.
  • e-Checklist can be used either one of the following servers: (minerva-wbm was down due to power glitch)
    • http://minerva-wbm.fnal.gov/minerva/echecklist/mininfo.php
    • http://nusoft.fnal.gov/minerva/echecklist/mininfo.php

minerva-om Update

  • minerva-om no longer support MINERvA Software
  • minerva-om has latest MINOS RMS Installed for om near check
  • .bashrc script modified to prompt users to use the “start_MinosOm.sh” command to start the minos om GUI
  • Nothing removed and every file and software are recoverable.

Validation

  • University of Minnesota Duluth group updating a pretty old UROC
  • They installed all Minerva and MINOS Software and have some problems with the ValidationTools, GMBrowser and RunVetoHVMonitor
  • Other than these 3 everyhing working and they started their shadow shift.

ValidationTools

  • UROC_sw_manager.py fail to “make clean” and “make” package
  • I removed the ValidationTools folder and ran UROC_sw_manager again – It worked!

GMBrowser

  • GMBrowser crashes (not responding)

Attempts

  • Update ControlRoomTools/gmbrowser folder to HEAD
    • Not worked!
  • Remove gmbrowser folder and run UROC_sw_manager.py
    • Not worked!

After these failed attempts, I concluded that the problem seems to be related with ROOT version they are using.

  • Rick installed ROOT 5.34 by building and tried running GMBrowser
    • It worked!

RunVetoHVMonitor

  • Script fail to open GUI
  • Problem is the missing “-x” option from SSH Command, they locally modified the script.
  • I need to modify and upload the correct version to CVS

 

Status Update: Fake Pi0 Reconstruction

  • First we suspected it is due to proton hits
  • I did some tests on Cluster History at different stages of reconstruction
    • Before Everything
    • Before Muon
    • Before Michel
    • Before Proton
    • Before Pi0 Reconstruction
    • etc…
  • Find out that Pi0 Reconstruction DOES NOT touch proton prongs.
    • This is good. We are not having extra hits from proton/pion hits
    • However, this means that our first assumption is wrong! There is some other problem!
  • Prof. Mann and I decided that the best way to figure out what is going on is a scanning session of two sample of events
    • Signal with FAILED Reconstruction
    • Background with SUCCESSFUL Reconstruction

Cluster History Test

Test 1: Proton Prong Cluster History

  • Checking Cluster History of a single prong
    • Before: Used
    • After: Used

Test 2: Cluster History in Different Stages

  • Got all clusters and checked their history before each stage
    • Example Result:
    • N(Unused) N(Used)
    • Before Reco 41 120
    • Before Muon 41 120
    • Before Michel 41 120
    • Before Proton 41 120
    • Before Pi0 41 120
    • After Reco 0 161
  • Findings
    • Prong Clusters are already USED
    • There are additional clusters which are unused are used in Pi0 Reconstruction
    • In 2 sample Michel Stage also changed cluster History – See full list: NClusters Sheet1

v4_7

  • Modified for CCProtonPi0 v1_04 Output
  • All Histograms Types Changed
    • Float to Double
    • TH1F to TH1D
    • TH2F to TH2D
  • Cut Table Statistics
    • Cut_Other: Represents all other cuts which are not tracked by CCProtonPi0 Package
  • Stacked Histogram for Pi0 Invariant Mass

v1_04

  • Revised Pi0 Reconstruction
    • Global Variables Used in all functions
    • Removed unused functions and variables
    • ConeBlobs()
      • Variable Naming Match
      • Return type changed: StatusCode to bool
        • Returns false if setPi0ParticleData() fails
      • ConeBlobs() main function that controls Pi0 Reconstruction.
        • If it fails, the reconstructEvent() for that event stops.
    • VtxBlob()
      • Return type changed: StatusCode to bool
        • Always returns true (return type reserved for future implementation)
    • processBlobs()
      • Removed unused variables
      • Return type changed: StatusCode to void
  • Options File Modifications
    • New Options files for DEBUG
    • Original options file set to INFO

v1_03

  • correctProtonProngEnergy()
    • Fixed a bug causing P4(Proton) = (nan,nan,nan,nan)
    • returns bool if Energy Correction Fails
  • setProtonParticleData()
    • double vertexZ no longer input parameter
    • Using Global Variable m_PrimaryVertex

Bug: correctProtonProngEnergy()

Problem:

  • For some events 4-Momentum is zero
  • two fail modes observed:
    1. kinked tracks – if kinked track has 0 energy
    2. If particle already has 0 energy

Attempts

  1. kinked tracks
    • Currently correctProtonProngEnergy() returns the final tracks energy as prong energy
    • Total Energy should be used
    • Summing 4-Momentum for each track in a prong
  2. If particle already has 0 energy
    • Check Particle energy before correctProtonProngEnergy()
    • if(E == 0) proton_isRecoGood = -1
    • else proton_isRecoGood = 1
  3. correctProtonProngEnergy() returns bool
    • returns false incase function fails
    • in this case use default particle 4-Momentum

UROC configure_runcontrol.sh Error

Problem

  • When you try to run “source configure_runcontrol.sh”
    • ImportError: No module named wx

Attempts

  1. Tried to import wx module to python for testing
    • python
    • import wx
    • That works! You can import wx into python
  2. Removed and re-installed package: python-wxgtk2.8
    • sudo apt-get remove python-wxgtk2.8
    • sudo apt-get install python-wxgtk2.8
    • Still getting the same error when I try to run the configure_runcontrol.sh
  3. Script: configure_runcontrol.sh runs the python script
    • mnvruncontrol/frontend/RunControlConfiguration.py
    • Running that script directly using “python RunControlConfiguration.py” works!