Reports 1-1 of 1 Clear search Modify search
Virgo Runs (O4c)
masserot - 11:07 Monday 16 June 2025 (67005) Print this report
DAQ : from 2025-06-15-17h08m12-UTC during 120s data missing

The 2025-06-15  starting from 17h08m12-UTC during 98s, the main channels are missing in the streams 

Below the report from some DAQ servers

  • RAW streams, the  FmStol01 and the FmStol02 servers reports the same following messages
    • 2025-06-15-17h10m28-UTC>ERROR..-CHANNEL> At Jun15,25-17:08:42 1434042540, V1:SDB2_B1p_PD1_Blended missing 30, V1:SDB2_B1p_PD2_Blended missing 30, V1:SDB2_B1_PD1_Blended missing 30, V1:SDB2_B1_PD2_Blended missing 30, V1:SNEB_B7_DC missing 30, V1:SWEB_B8_DC missing 30, V1:SPRB_B4_56MHz_Q missing 30, V1:SIB2_B2_8MHz_I missing 30, V1:CAL_NE_MIR_Z_NOISE missing 30, V1:CAL_WE_MIR_Z_NOISE missing 30, V1:Sc_BS_MIR_Z_CORR missing 30, V1:Sc_PR_MIR_Z_CORR missing 30, V1:Sc_NE_MIR_Z_CORR missing 30, V1:Sc_WE_MIR_Z_CORR missing 30, V1:Sc_BS_MAR_Z_CORR missing 30, V1:Sc_PR_MAR_Z_CORR missing 30, V1:Sc_NE_MAR_Z_CORR missing 30, V1:Sc_WE_MAR_Z_CORR missing 30,  are back

    • 2025-06-15-17h12m40-UTC>ERROR..-CHANNEL> At Jun15,25-17:09:42 1434042600, V1:Sc_NI_MIR_Z_CORR missing 90, V1:Sc_WI_MIR_Z_CORR missing 90, V1:Sc_NI_MAR_Z_CORR missing 90, V1:Sc_WI_MAR_Z_CORR missing 90,  are back

       

    • 2025-06-15-17h10m28-UTC>ERROR..-CHANNEL> At Jun15,25-17:08:42 1434042540, V1:SDB2_B1p_PD1_Blended missing 30, V1:SDB2_B1p_PD2_Blended missing 30, V1:SDB2_B1_PD1_Blended missing 30, V1:SDB2_B1_PD2_Blended missing 30, V1:SNEB_B7_DC missing 30, V1:SWEB_B8_DC missing 30, V1:SPRB_B4_56MHz_Q missing 30, V1:SIB2_B2_8MHz_I missing 30, V1:CAL_NE_MIR_Z_NOISE missing 30, V1:CAL_WE_MIR_Z_NOISE missing 30, V1:Sc_BS_MIR_Z_CORR missing 30, V1:Sc_PR_MIR_Z_CORR missing 30, V1:Sc_NE_MIR_Z_CORR missing 30, V1:Sc_WE_MIR_Z_CORR missing 30, V1:Sc_BS_MAR_Z_CORR missing 30, V1:Sc_PR_MAR_Z_CORR missing 30, V1:Sc_NE_MAR_Z_CORR missing 30, V1:Sc_WE_MAR_Z_CORR missing 30,  are back
  • All the TolmFrameBuilders and the Imaging servers  complains to be unable to send theirs frames during this time period

  • All the slow frame builders servers complains to be unable to send theirs requets

 

To be confirmed by the ET department, it sound like an Ethernet switch restart

Comments to this report:
cortese, kraja - 10:49 Tuesday 17 June 2025 (67014) Print this report

No events have occurred on the network switches.

At that time there has been an iscsi paths rearrangement on the storage array used by the fs01 file server which exports the /virgoData and /virgoLog areas ( besides /olusers and /virgoDev) causing a temporary increase in the I/O latency which is normally transparent for the NFS clients except than for RTPC FdIO processes.

This problem has been observed many times and is due to the combination of multiple factors:

  1. the load on the file server is periodically increased by client processes that are started/reconfigured to access more heavily the above mentioned disk areas.
    From this ganglia graph it can be seen that the reading load since the run break in may/june has gone to 20-40MB/s whilst it was lower than 10MB/s in the previous months and has not come back to that level yet
    Note that this kind of load onsets has been already recorded during past missing frames periods ( see this logbook entry );
    At that time heavy I/O was coming from  olserver119 and olserver114 hosts.
    Currently it is still the case, with the addition of olserver53;
     
  2. the FdIO servers are particularly sensitive to I/O latency on /virgoLog since currently the logging about raw frames is handled in the same thread as  the raw frames trasmission itself.
    This causes DAQ raw frames tx buffers to overflow when writing to /virgoLog suffers a delay;
     
  3. the current version of the VMware underlying hypervisor software, contrary to the one running in the previous runs, has changed the way to provide Fault Tolerance with the result that storage I/O latency cannot be improved anymore from the present level.
Images attached to this comment
Search Help
×

Warning

Error

The present report has been modified outside this window. Please check for its integrity in the main page.

Refreshing this page will move this report into drafts.

×