Reports 1-1 of 1 Clear search Modify search
AdV-COM (AdV commissioning (1st part) )
bersanetti, ruggi, boschi - 20:07 Wednesday 23 May 2018 (41539) Print this report
LOW_NOISE_3 restored, solved issue with the CoilsSbNE process

Yesterday we observed the systematic unlock while trying to reach LOW_NOISE_2; after another unsuccessful trial, today we tried to go manually to such state, and it worked, so everything was fine at the SUS level.

After some digging, we found out that the problem was on the CoilsSbNE process, which was killed and restarted lately; unfortunately, some leftovers were still alive, as it appeared in the process list (only two processes per box should exist):

bersanet@olserver121:~  $ ps aux | grep CoilsSbNE
168:virgorun 14825  6.3  0.1 656892 14688 ?        SNl  May17 617:38 python /virgoApp/PySb/v3r1p0/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbNE.cfg CoilsSbNE
188:virgorun 26379  0.0  0.2 656892 16520 ?        SN   May21   0:00 python /virgoApp/PySb/v3r1p0/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbNE.cfg CoilsSbNE
194:virgorun 27485  8.7  0.3 656636 31784 ?        SNl  May21 231:20 python /virgoApp/PySb/v3r1p0/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbNE.cfg CoilsSbNE
195:virgorun 27495  0.0 13.4 1490948 1099496 ?     SN   May21   0:43 python /virgoApp/PySb/v3r1p0/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbNE.cfg CoilsSbNE
 

Possibly what happened and partly hid the problem was the fact that ITF_LOCK was started before this operation, so it was talking with the old cm name of the process, while my terminal was launched afterwards and talked to the process actually alive. We stopped the process from VPM, killed the zombies and restarted the process without issue. The automatic switch to LOW_NOISE_2 (and _3) was achieved again.

Comments to this report:
carbognani - 12:07 Thursday 24 May 2018 (41548) Print this report

In order to prevent (hopefully once for all...) the recurrence of such an event I have wrapped the PySb.py command with a shell script that take care of killing all still hanging instances of the process via:

pkill -9 -f "<unique identifier>"

Doing some tests by freezing a CoilSb process via ctrl-Z both manually and from VPM this workaround seems working fine. A generalization of the same protection could be implemented at the level of the VPM start-method in order to prevent those occurrences for all VPM driven processes.

In parallel I am working to finalize the completely new version of PySb, based on the developed python3 and asincio template, and I will soon put it under test.

carbognani - 15:14 Friday 01 June 2018 (41651) Print this report

This early moring I made tests on the new version of Coil Switchbox server (PySb v4r0).

The new version was put in operation from VPM and during lock attempts unlocks could be regularly observerd on the transitions to LOW_NOISE_2. Looking at the sequence of events in the logfiles this seems due to some delay (few seconds) in the execution of the cm commands changing the relays status.

I have then restored the previous version (PySb v3r1p1) and after that LOW_NOISE_3 could be immediately achived.

I will keep investigating offline the issue on the CoilsSbTest instance.

carbognani - 9:30 Friday 08 June 2018 (41749) Print this report

On Wednesday early morning the new version of PySb (v4r0) has been put in operation again after several modifications. Immediately after the installation LOW_NOISE_3 could be achieved once and it has been achieved few times as of now.

This new version of PySb is using the Python3 based pyserver template. It is no more multiprocessing (so only one process is shown by ps) but run asynchronously and concurrently on a single process on top of the asyncio library.

Respect to the previous multiprocess version:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
virgorun 27823  8.6  1.0 590844 84980 ?        SNl  08:56   0:32 python /virgoDev/PySb/v3r1p1/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbTest.cfg CoilsSbTest
virgorun 27834  0.0  0.8 369648 67440 ?        SN   08:56   0:00  \_ python /virgoDev/PySb/v3r1p1/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbTest.cfg CoilsSbTest

it require roughy 1/4 of the CPU and 1/3 of memory:

virgorun 21816  2.1  0.6 484632 51892 ?        SNl  Jun06  62:57 python3 /virgoDev/PySb/v4r0/scripts/PySb.py /virgoData/VirgoOnline/CoilsSbWI.cfg CoilsSbWI

The five instances of CoilsSb process (WI, NI, WE, NE, BS) have been running for two days so far without crashes, situation will be kept monitored.

Search Help
×

Warning

×