Reports 1-1 of 1 Clear search Modify search
Virgo Runs (O4c)
menzione - 6:43 Friday 20 June 2025 (67058) Print this report
Operator Report - Night shift

ITF found locked at LOW_NOISE_3 in SCIENCE mode.

At 21:37 UTC ITF unlocked due to Instability in the LN2 NI_MAR -> NE_MIR reallocation filter or end TM (MIR or MAR) correction saturation. Unfortuntely the unlock caused an opening of WI ID loop. Properly closed.
Relocked at first attempt. SCIENCE mode set at 22:35 UTC.
At 00:19 UTC ITF unlocked again due to ASC DIFFp TY diverging or end TM (MIR or MAR) correction saturation.
Autorelocked at first attempt. SCIENCE mode at 01:08 UTC.
ITF unlocked again at 02:09 UTC (TBC). Unfortunately INJ_MAIN was not able to complete FmodErr. The node was stuck looping between IMC_RESTORED and  FMODERR_TUNED due to a problem with LNFS.
Under suggestion of ISC expert I temporary bypassed FmodErr via ITF_LOCK.ini "fmoderr_skip = True".
Relocked after two cross-alignments in ACQUIRE_DRMI. SCIENCE mode at 04:07 UTC.

Guard Tour (UTC)
20:56 - 21:35
23:03 - 23:41
01:06 - 01:46
03:08 - 03:50

Sub-system reports

DAQ
01:27 UTC - Lnfs100 crashed. Killed via shell, restarted via VPM.

Pending actions

DAQ
(20-06-2025 03:00 - ) LNFS not responding. INJ_MAIN metatron node stuck before FMODERR_TUNED.

Oncall events

ISC
(20-06-2025 03:00 - 20-06-2025 03:10) Operator on site with expert from remote
Status: Ended
Description: FmodErr
Actions undertaken: INJ_MAIN Metatron node stuck while trying to reach FMODERR_TUNED due to a problem with LNFS.
FmodErr disabled via "fmoderr_skip = True" in ITF_LOCK.ini.

Images attached to this report
Comments to this report:
bersanetti - 8:31 Friday 20 June 2025 (67061) Print this report

Looking at the log files of INJ_MAIN, I could find several of these errors:

138:2025-06-20-02h20m22-UTC>WARNING-[FMODERR_CHECK.run] USERMSG 0: EZCA CONNECTION ERROR: Any Other Error: Could not get value from channel: INJ_LNFS_AMPL2

So the node could not read the INJ_LNFS_AMPL2 online, as it is in fact the case also with dataDisplay (see Figure).

We should investigate why the channels are not available anymore.

Images attached to this comment
masserot - 9:08 Friday 20 June 2025 (67068) Print this report

The Lnfs SMS data are collected by the FbsISC slow frame builder . According the operator report the Lnfs server has been stopped and restarted at 01:27 UTC  .

At the same time period one can find in the FbsISC logfile the following lines:

  • 2025-06-20-00h57m16-UTC>ERROR..-FbsCmSmsData> GPS:1434416254 No data extracted for Lnfs100  : get answer but without data (see plot)
  • 2025-06-20-01h20m34-UTC>INFO...-FbsFrameSmsQuery> GPS:1434417652, sms Lnfs100 pending 4/3 : no  answer from the Lnsf100 server to the FbsISC requets, so stop sending requests
  • 2025-06-20-01h20m34-UTC>INFO...-CfgReachState> Active(Active) Ok
  • 2025-06-20-01h26m35-UTC>INFO...-Cm> CheckMasksPoll> Lnfs100 - POLLERR  : Cm detects an error but not the server disconnection

To restore the Lnfs100 SMS data collection the following actions were performed :

  • 2025-06-20-06h35m52-UTC    Remove SMS [SMS server=Lnfs100]' sent to FbsISC
  • 2025-06-20-06h35m55-UTC    'Reload Configuration' sent to FbsIS

After these operations, the Lnfs SMS channels are again available in the DAQ (see plot)

Images attached to this comment
bersanetti - 16:55 Friday 20 June 2025 (67072) Print this report

Later today the issue happened again. The FbsISC framebuilder lost connection with the Lnfs100 process:

FbsISC:
2025-06-20-09h49m05-UTC>ERROR..-FbsCmSmsData> GPS:1434448163 No data extracted for Lnfs100
2025-06-20-10h54m40-UTC>WARNING-Main> local - gps 1434452098-130968000, prev 1434452097-002215000(000000000) - frDt 1, dt 1.13097 - nb 1 - tmo 0.004522

Lnfs100:

2025-06-20-07h46m00-UTC>INFO...-Sent AMPL 2 -6 1 command to LNFS1
2025-06-20-09h49m04-UTC>INFO...-CfgReachState> Error(Error) Ok
2025-06-20-09h49m04-UTC>WARNING-Timeout from worker process!
2025-06-20-09h50m09-UTC>WARNING-Timeout from worker process!

However the problem became evident only later, once we actually tried to access the LNFS data, i.e. during FmodErr at the beginning of the lock acquisition (Figure 1).

This time the Lnfs100 was apparently dead from the VPM, but connecting to olserver129 I could see both processes actually still alive:

virgorun@olserver129[~]: ps aux | grep Lnfs100
virgorun 32242  0.0  0.0 113424  1692 ?        S    04:24   0:00 bash /virgoApp/PyLnfs100/v4r1p1/Linux-x86_64-CL7/bin/PyLnfs100-conda /virgoData/VirgoOnline/Lnfs100.cfg Lnfs100
virgorun 32714  1.5  0.9 741672 75620 ?        SNl  04:24   9:11 python3 /virgoApp/PyLnfs100/v4r1p1/scripts/PyLnfs100.py /virgoData/VirgoOnline/Lnfs100.cfg Lnfs100
virgorun 33491  0.0  0.0 112824   992 pts/8    R+   14:15   0:00 grep --color=auto Lnfs100
virgorun@olserver129[~]: kill -9 32242 32714
virgorun@olserver129[~]: ps aux | grep Lnfs100
virgorun 33582  0.0  0.0 112820   988 pts/8    S+   14:16   0:00 grep --color=auto Lnfs100

After killing the processes (as virgorun) I could restart it from VPM, then the same steps done by Alain (remove SMS from Lnfs100 and ReloadConfig, both actions on FbsISC) restored the proper communication.

In case this happens again the same procedure can be followed; however, most probably it won't be sufficient: the issue happens when the INJ_MAIN node resets the modulation amplitude in the Lnfs and cannot read it back. But the command is not actually received because the communication was lost already before.

So, if INJ_MAIN still notifies "Waiting for 8MHz mod ampl going to default", one should send again the command from a standard iPython shell:

cm_send('Lnfs100','SET8AMPL',15)

Images attached to this comment
Search Help
×

Warning

Error

The present report has been modified outside this window. Please check for its integrity in the main page.

Refreshing this page will move this report into drafts.

×