Reports 1-1 of 1 Clear search Modify search
AdV-DAQ (Data collection)
masserot - 23:21 Friday 14 June 2024 (64509) Print this report
DAQ : missing frames - network issues ?

Today  2024-06-14 from 16h33m-UTC to 16h40m-UTC, some troubles occurred in the DAQ chain, probably due to a network outage, to be confirmed by the IT department.

The attached plots ( plot and zoom ) using the trend strean shows that

  • the troubles started around 16h33m10-UTC up to 16h39m40-UTC :
  • according the trend strean during this period, most of the data are lost

but looking at the raw stream , it appears that the data are lost from 16h36m18-UTC to 16h37m50-UTC (plot and zoom)

Below some details related to the troubles:

  • all the rtpc frame builders, TolmFrameBuilder servers and Imaging servers were unable to transmit theirs frames
    • ISC_Fb:
      • 2024-06-14-16h36m28-UTC>WARNING-[TolmFrameBuilder::ControlMerging] frame: 1402418206.200000000: merging is triggered on timeout (internal 1402418206.591010100 > max 1402418206.560000000)
      • 2024-06-14-16h37m11-UTC>ERROR..-frame 1402418248 not send to FbmFFE: sending queue is full
      • 2024-06-14-16h37m14-UTC>INFO...-output queue to FbmFFE has been flushed; Sending frame 1402418252
    • SUSP_Fb
      • 2024-06-14-16h36m38-UTC>ERROR..-frame 1402418216 not send to FbmFFE: sending queue is ful
      • 2024-06-14-16h36m41-UTC>WARNING-[TolmFrameBuilder::ControlMerging] frame: 1402418219.200000000: merging is triggered on timeout (internal 1402418219.562315050 > max 1402418219.560000000)
      • 2024-06-14-16h37m38-UTC>INFO...-output queue to FbmFFE has been flushed; Sending frame 1402418276
    • INJ_Fb
      • 2024-06-14-16h36m38-UTC>ERROR..-frame 1402418216 not send to FbmFFE: sending queue is full
      • 2024-06-14-16h36m42-UTC>WARNING-[TolmFrameBuilder::ControlMerging] frame: 1402418219.800000000: merging is triggered on timeout (internal 1402418220.161820050 > max 1402418220.160000000)
      • 2024-06-14-16h37m14-UTC>INFO...-output queue to FbmFFE has been flushed; Sending frame 1402418252
  • Theirs related FrameMerger servers were complaining that they were waiting for input frames
    • FbmFFE
      • 2024-06-14-16h36m34-UTC>WARNING-FdFrMrgr: Could not wait longer for frame parts, output 1402418203.8, nSources=15 first:SSFS_Fb
      • 2024-06-14-16h37m39-UTC>INFO...-CfgReachState> Golden(Golden) Ok
    • FbmFE
      • 2024-06-14-16h36m34-UTC>WARNING-FdFrMrgr: Could not wait longer for frame parts, output 1402418203.0, nSources=4 first:FbsISC
      • 2024-06-14-16h37m37-UTC>INFO...-CfgReachState> Golden(Golden) Ok
    • FbmMain
      • 2024-06-14-16h37m10-UTC>WARNING-No input frame since at least 44 seconds
      • 2024-06-14-16h37m18-UTC>INFO...-CfgReachState> Golden(Golden) Ok

To be noted, that these troubles occurred in the same time period as the latency glitches between 16h-UTC and 17H-UTC

Images attached to this report
Comments to this report:
rolland - 12:19 Tuesday 18 June 2024 (64538) Print this report

This morning, I found the process PCal_NE_Ampli complaining about "Broken pipe". Indeed, the NetCom switch ethcali1 (to convert from RS232 to Ethernet) is no more available via ping. It is located in NE building, in the calibration rack (leftmost rack).  This started on Friday, 14 June at 18h31m UTC. Can this be related to the network issue seen a bit earlier, or is it independant?

The process has been stopped, and cannot restart since ethcali1 is not available on the network. 

This process is used to monitor the PCal laser amplifier, and mainly to switch on/off the pump diode from remote. It is not critical and we can run without it for some time, but it will be useful to get it back alive soon.

 

2024-06-14-18h31m23-UTC>ERROR..-[Errno 11] Resource temporarily unavailable
[....]
2024-06-14-18h47m04-UTC>ERROR..-[Errno 11] Resource temporarily unavailable
2024-06-14-18h47m05-UTC>ERROR..-[Errno 110] Connection timed out
2024-06-14-18h47m06-UTC>ERROR..-[Errno 32] Broken pipe
 
melo, rolland - 10:47 Tuesday 25 June 2024 (64587) Print this report

This morning I went to NE building to restart the NetCom Ethernet Bridge ethcali1, which was not responding since June 14th. I switched it off and on again while Loic verified from remote that it went back to normal functioning. The action took place around 8:15h UTC. 

Search Help
×

Warning

×