Virgo Logbook

AdV-COM (automation)

bersanetti - 3:02 Thursday 16 March 2023 (59275)

Status of PyHWS and hwsAna.py

Tonight, at the end of the afternoon shift, it was intended to launch a YAG Absorption Measurement. In the end we managed to, but with some bumps along the way. As reported in #59271 and #59274, after locking in CARM_NULL_1F we setup the automation in order to perform the measurement on the DET HWS (NI mirror).

During the first step of the process, i.e. the timeout where the lock is kept before doing anything, I noticed the PyHWS process (the one which launches the hwsAna.py acquisition script, see #58035, #57352 and others) having a way too low Process Uptime, about a few hours. I then noticed that today it crashed and it was restarted from VPM. Unfortunately this is not what we want for PyHWS, but being the process in a very peculiar (hopefully temporary) state, there is no fault in that.

The thing is that, as explained in #59016, currently we use PyHWS launched manually from a human userspace (formerly fcarbogn), in order to possibly avoid the unexpected interruption of the HWS acquisition.

So I tried to relaunch it from my shell manually (bersanet@servertcs1, ssh-ed from ctrl23), using the same command and configuration file as the VPM instance, with Metatron and its timers in ABSORPTION_MEAS_INIT already running.

Unfortunately, the process launched but no cm commands launched to acquire with the HWS were successful; so both TCS_MAIN and ITF_LOCK were paused while the issue was investigated.

I started to see (thanks to the recent introduction of a logfile) that the hwsAna.py process (the one living in /virgoDev/TCS_HWS/Python/HWS_ANA/v3r0/) was crashing due to Python errors about indentation. I could not write either the file or the directory where it lives into, so I duplicated the file into hwsAna_DB.py and edited as user optics. PyHWS was updated in VPM to use this new script. After fixing the indentation errors, I had errors about print statements written as in Python2, with my shell being Python3.

Long story short, now PyHWS runs from user bersanet, on a Python2 shell in servertcs1, spawned from ctrl23 via ssh.

The acquisition of the ref images was tested, which went fine, then the live ones (coherent with the wanted measurement), with no issues.

After 6 wavefronts I un-paused both ITF_LOCK and TCS_MAIN, but the timer (HWS acquisition with ITF locked) had expired (I guess as the node itself, so its timers, is still running and only the execution of the run() method is actually paused). So the automation moved already to unlocking the ITF and proceed with the cold measurement (2h, PR/SR aligned, SINGLE_BOUNCE_NI, CH off). Wavefront 7 may be still valid for the "locked" measurement, to be checked by TCS experts.

At this point the acquisition should be going on, and the other 4 h of measurement left started (from the unlock onwards).

It is better to wait until approximately 5 LT before attempting to relock.

Bottom line: the situation about the peculiar way PyHWS is started should be made more robust, and/or hwsAna.py should be made more Python3-compliant.

Comments to this report:

ballardin - 18:55 Thursday 16 March 2023 (59293)

Today I tried to understand the software cause of the error repored in the elog #59275 by Diego. I checked the Python code, used many times. But after a check of images I found that during the 0203clive it's possbile to see a clear images of the Hartmann sensor, while in the 0204clive it seems something was swiched off. Founding this kind of images, the Python application was not able to go ahead. But it's not clear for me why there is this kind of difference in the images.

Images attached to this comment

carbognani - 10:50 Friday 17 March 2023 (59302)

For addressing the points raised by Diego, the following was done yesterday afternoon:

"the situation about the peculiar way PyHWS is started should be made more robust"
- PyHWS package has been updated/tagged, aligned with use of latest CMT Conda environment, PmInstalled in /virgoApp with version v1r1 and corresponding update in the VPM config was done.
- PyHWS (version v1r1 form /virgoApp) is now running started from VPM as user virgorun (since latest results pointed to the fact that there was not actual difference between running as virgorun or as a specific user). So from now on PyHWS can be simply stopped/restarted from VPM as any other process
"hwsAna.py should be made more Python3-compliant"
- Python3 compliance should be the final goal, for the moment in order to insure proper starting as Python2 script into the default conda environment in which PyHWS now runs and where default Python is actually Python3, the shebang put at the beginning of hwsAna.py has been updated from #!/usr/bin/env python to #!/usr/bin/env python2. Actually this should be done for any Python2 remnant script still around since at a certain point the swap to python=python3 default will be made at site level environment.

Then, as agreed during the daily meeting, a test measurement has then been started triggering it as:

cm_send('PyHWS','RUN', "-r DET /virgoDev/Sa/TCS/cfg/test/DET.cfg"
cm_send('PyHWS','RUN', "-l DET /virgoDev/Sa/TCS/cfg/test/DET.cfg"

I monitored the situation this night and restarted the same acquisition sequence again at around 22:38 and 04:50

Looking inside the storage directories:

/data/tcs/buffer/test/HWS-RC/DET/20230316T1754
/data/tcs/buffer/test/HWS-RC/DET/20230316T2237
/data/tcs/buffer/test/HWS-RC/DET/20230317T0449

pending experts confirmation, it looks like that the three tonight acquisition sequences ended all successfully with 225 iterations each.

Considering that the failure of Wednesday night seems totally uncorrelated with previous problems and related instead to a totally wrong acquired image (with the hypothesis of the sled beam intensity either sudden decrease or long term drift being currently investigated) the current configuration could be tried with further fully automated measurements and see the results.

rocchi - 13:09 Monday 20 March 2023 (59334)

I looked at the frames to try to understand the reason for the drop of the intensity in the HWS images.

Figure 1 shows the trend of the maximum intensity on the CCD as a function of time (from 23:08 UTC).

The first drop seems to be rather well time-correlated to the tilt of some of the recycling cavity optics, at the time of the onlock (fig. 2).

The final drop is concident with the large displacement occurred to the SR, around 3:21 UTC.

This excludes any issue with the SLED source.

Images attached to this comment