- Previously GMBrowser that shifters look at only uses a fraction of the gates, because the early processing stages (particularly DecodeRawEvent) were slow.
- Now that we have a faster version of DecodeRawEvent, and we modified GMBrowser to use all gates
- Modified following parameters in NearlineCurrent.opts in Tools/DaqRecv/options on mnvonlinelogger, to be 100 percent:
- PdstlPrescaler.PercentPass = 25;
- LinjcPrescaler.PercentPass = 25;
- NumibPrescaler.PercentPass = 20;
- Ran the “nearline_software_sync.sh” script in all Nearline Machines to get the update
- Informed Current Shifter about the update and started GMBrowser at Tufts UROC
- Will investigate the behavior for some time, until we make this change permanent.
We still have problems for nearline file management and I listed the ones I found. Here is the list of folders need to be managed.
- Synchronize /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/
- Synchronize /scratch/nearonline/var/gmplotter/plotter/ with /minerva/data/users/nearonline/gmbrowser/plotter/
- Synchronize /scratch/nearonline/var/gmplotter/www/ with /minerva/data/users/nearonline/gmbrowser/www/
- Copy Files from /scratch/nearonline/var/gmplotter/www to email@example.com:/opt/if-wbm/htdoc/minerva/echecklist/gmb_hists
Here is the status of each section
1) I modified the script to use rsync command to sync between /scratch/nearonline/var/job_dump/ with /minerva/data/online_processing/swap_area/ For now, we have a stable synchronization between two folders however,this method copies the .log files also, which is unnecessary.
2,3) No USER “nearonline” under /minerva/data/users Setup script assigns the following export NEARLINE_BLUEARC_GMPLOTTER_AREA=/minerva/data/users/nearonline/gmbrowser There is a “nearonline” user under /minerva/app, however we should not copy any data file to the /minerva/app.
4) e-Checklist works, I conclude this section works. I did not checked the details.
We should organize a plan to solve all the problems in nearline file management. I propose the following,
- Lets use rsync command for 1,2,3
- We need to create a folder “/minerva/data/users/nearonline” and let other systems know where we are copying the files.
- If there is a folder I forgot to sync between nearline and bluearc, that folder also needs to be added to the script.
- New server for Veto HV Monitoring GUI is mnvonlinemaster
- Scripts logins mnvonlinemaster as mnvonline and runs the necessary commands
- Updated UROC Software and notified Users for the update
- Updated Control Room computers with the new script
- For Debugging, Carrie changed verbose mode of log files, which makes log files around 10GB
- rsync command frozen while copying these huge files
- killed all rsync processes and removed all log files with GB Size
- Killing rsync Processes
- ps aux | grep rsync
- kill <PID>
- e-Checklist does not updates for 2 days
- The problem was the disk /mnvonline0/work/data unmounted from mnvnearline1
- Mounted the disk as ROOT
- mount /mnvonline0/work/data
- Donatella requested to resubmit an old job with the new croce_v3 firmware
- Submitted the following jobs manually
- Data: August 8 – ME Data
- Run: 10599
- Subruns: 40, 41, 42, 43, 44, 45
- Copied RAW Data files from /minerva/data
- /minerva/data is only accessible from mnvnearline1
- Detailed Procedure described in wiki page.
- Updated minerva-cr-01 and minerva-cr-02 to v10r9p1
- Modified setupFiles for initiating setup with v10r9p1
- Modified setup.min.soft.sh for v10r9p1
- Python was a problem for v10r9p1 setup script
- Created an alias to use the local version python instead of Framework Version
- alias python=”/usr/bin/python”
- Tested and Tagged ControlRoomTools under v10r9p1 as “stable_v10r9p1″
- Updated the .bashrc for using the setup file under v10r9p1
- Linked .k5login with “cmtuser/Minerva_v10r9p1/Tools/ControlRoomTools/authenticate/k5login-master”
- Updated the documentation on wiki: “Control_Room_Setup_Manual”
- New mount point of /minerva disk is
- Updated the following script
- AnalysisFramework / Tools / ControlRoomTools / authenticate / MountRemoteFS
- Informed UROC Users for the update
- mnvonlinelogger updated
- Slave Nodes will receive update automatically
- Updated Packages under cmtuser area
- Tools/DaqRecv [croce_v3]
- cvs co -r croce_v3 Tools/DaqRecv
- Tools/DaqRecv [croce_v3]
- Installed Packages under cmtuser area
- Event/MinervaKernel [croce_v3]
- This package required for Event/MinervaEvent
- getpack -u Event/MinervaKernel
- Event/MinervaEvent [croce_v3]
- cvs co -r croce_v3 Event/MinervaEvent
- Event/MinervaKernel [croce_v3]
- Built All Packages in the following order
- Building Commands:
- cmt config
- cmt make
- source setup.sh
- Updated Packages under cmtuser area
The automated “nearline_bluearc_copy.sh” script on mnvnearline1 fails to copy necessary files from local_dump_area to online_processing/swap_area
(from /scratch/nearonline/var/job_dump to /minerva/data/online_processing/swap_area)
- Investigating mnvnearline1:scripts/nearline_bluearc_copy.sh
- Script runs automatically every 5 minutes.
- Log file for the script: /scratch/var/nearonline/logs
- Local copy from HEAD to following folders WORKS
- NEARLINE_DUMP_AREA /scratch/nearonline/var/job_dump
- NEARLINE_LOCAL_GMPLOTTER_LOCATION /scratch/nearonline/var/gmplotter
- The problem is with the python script “filechecklist.py”
- It does not generate the file list for files from the following folders:
- It works for the following folder
- Since there is no file list generated by the python script “filechecklist.py”, NO files copied to the swap_area
- I modified the script to use rsync command.
- Now it synchronizes the local_dump_area and online_processing/swap_area
- Inside the script Jeremy notes that, “using rsync for this stage incurs a lot of overhead on the BlueArc disk”, therefor, he writes a more efficient script “filechecklist.py” for this task
- The Problem is confirmed.
- .fileindex under /scratch/nearonline/var/job_dump got corrupted and causing “file checklist.py” to crash for that folder
- Using rsync manually fixed the .fileindex
- Software sync between mnvonlinelogger and mnvnearline1 updates the nearline_bluearc_copy.sh script to the original version
- Now everything works as before. The near ine_bluearc_copy.sh script copies the changed files to bluearc area using “file checklist.py”
- GMBrowser Live works but shifter can not access to the previous runs and subruns
- GMBrowser -r xx -s xxx does not work
- The files are not copied automatically to the /minerva/data/online_processing/swap_area
- I copied the files manually:
- Connect to the mnvnearline1 – it has the BlueArc /minerva/ mount
- Necessary files located: /scratch/nearonline/var/job_dumb
- This solved the issue for non copied files.
- I checked the log file for nearline_bluearc_copy.sh script under /scratch/nearonline/var/logs
- After sometime the auto-script seems to be working.
- Currently, we are not running any runs, I will check the status again tomorrow
MINERvA Software Installation on minerva-cr-01 and minerva-cr-02
- Firefox configured for Shifter Bookmarks
- Special Kerberos Principal Installed
- ROOT 5.34/21 installed and tested
- ControlRoomTools Installed and Configured
- setup.sh file and .k5login file configured with ControlRoomTools
- GMBrowser installed and tested
- mnvdaqrunscripts installed and configured
- registered new hostnames(minerva-cr-01 and minerva-cr-02) into “mnvdaqrunscripts/install.sh”
- committed and tagged changes as “oaltinok_2014_09_17″
- mnvruncontrol installed and configured
- I tried connecting using run control but could not connected. Will test again tomorrow.
- MINOS Software is NOT installed
- When I tried to scp the setupFiles from old computers it gave me the following error
- protocol error: unexpected <newline>
- Reason for this error is .bashrc runs the setup script and echo’s to terminal for each log in.
- Comment out the setup file and it will work.
- Added new Control Room Machines
- Tagged version as “oaltinok_2014_09_17″
- Shifter required my assistance for the UROC Software Update
- Killed all processes
- Jeremy updated the
- I updated the remaining Control Room Computers
- Testing the Updates
- Successful Test on Control Room Computers
- Successful Test on Rochester UROC
- Successful Test on Tufts UROCs
- I updated the UROC_sw_manager.py script and notified UROC Users
- On Sunday 03:30 am there was a power failure affecting MINOS and MINERvA underground machines
- Control Room Computers lost network mount to /minerva/data/
- GMBrowser needs /minerva/data mounted and it was not working
- minerva-evd is used by UROCs to mount /minerva/data and they are also affected.
- Carrie opened a service ticket to ask Computer Division Help for Control Room Computers
- Computer Division solved the incident and all machines and UROCs working properly.
- mnvonlinebck1.fnal.gov machine is still down and we have no access to Veto Wall HV Monitoring.
- e-Checklist can be used either one of the following servers: (minerva-wbm was down due to power glitch)
- minerva-om no longer support MINERvA Software
- minerva-om has latest MINOS RMS Installed for om near check
- .bashrc script modified to prompt users to use the “start_MinosOm.sh” command to start the minos om GUI
- Nothing removed and every file and software are recoverable.
- The version(v9r1) used by minerva-om is removed
- Tried to update it but its 32 bit architecture is very old
- Need to find a way to install rms om near on other computers.
- University of Minnesota Duluth group updating a pretty old UROC
- They installed all Minerva and MINOS Software and have some problems with the ValidationTools, GMBrowser and RunVetoHVMonitor
- Other than these 3 everyhing working and they started their shadow shift.
- UROC_sw_manager.py fail to “make clean” and “make” package
- I removed the ValidationTools folder and ran UROC_sw_manager again – It worked!
- GMBrowser crashes (not responding)
- Update ControlRoomTools/gmbrowser folder to HEAD
- Not worked!
- Remove gmbrowser folder and run UROC_sw_manager.py
- Not worked!
After these failed attempts, I concluded that the problem seems to be related with ROOT version they are using.
- Rick installed ROOT 5.34 by building and tried running GMBrowser
- It worked!
- Script fail to open GUI
- Problem is the missing “-x” option from SSH Command, they locally modified the script.
- I need to modify and upload the correct version to CVS
- Changed ROOT Password
- Added Travis as user “Nova”
- Group e938
- Backup /home/*
- All unnecessary files removed from both of the UROCs
- Old Shift Summary Plots
- Obsolete Log Files
- When you try to run “source configure_runcontrol.sh”
- ImportError: No module named wx
- Tried to import wx module to python for testing
- import wx
- That works! You can import wx into python
- Removed and re-installed package: python-wxgtk2.8
- sudo apt-get remove python-wxgtk2.8
- sudo apt-get install python-wxgtk2.8
- Still getting the same error when I try to run the configure_runcontrol.sh
- Script: configure_runcontrol.sh runs the python script
- Running that script directly using “python RunControlConfiguration.py” works!
- Dead SSH Connections between runcontrol server.
Warning: remote port forwarding failed for listen port 3012
Exception in thread Thread-1:
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 551, in __bootstrap_inner
File “/home/minerva/mnvruncontrol/backend/Threads.py”, line 236, in run
File “RunControl.py”, line 1413, in ConnectDAQ
success = self.PrepareSSHTunnels(ssh_user=ssh_user, remote_host=remote_host, remote_port=remote_port)
File “RunControl.py”, line 1621, in PrepareSSHTunnels
deliveries = self.postoffice.SendTo(message=test_msg, recipient_list=recipients, timeout=3.0, with_exception=True)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1298, in SendTo
responses = self.SendWithConfirmation(message, timeout, with_exception)
File “/home/minerva/mnvruncontrol/backend/PostOffice.py”, line 1325, in SendWithConfirmation
raise AlreadyWaitingError(“SendWithConfirmation can’t wait for multiple messages simultaneously.”)
AlreadyWaitingError: SendWithConfirmation can’t wait for multiple messages simultaneously.
- Restarted the computer
- It did not worked!
- Tried to remove all SSH connections
- ps -ef | grep ssh
- kill <pID>
- There was no idle SSH Connections
- Checked Tufts UROC-02 listener port
- It was 3012 (same as UROC)
- Changed UROC listener port to 3017
- It worked!
- Updated the UROC_User_Details wiki page for the new listener ports
- 3 New wiki-pages for UROC Expert Documentation under https://cdcvs.fnal.gov/redmine/projects/minerva-ops/wiki/Ops
- UROC Expert Documentation
- UROC User Details
- New UROC Authorization
- UROC Software Details