Reports 1-1 of 1 Clear search Modify search
AdV-DAQ (Data collection)
masserot, pacaud - 16:18 Monday 21 October 2024 (65361) Print this report
DAQ : latency jumps since the 2024-10-14-16h57m-UTC

The first attached plot show the latency

  • for some TolmFrameBuilders running on the rtpcs and building theirs frames at 0.2s, and  and the related frame merger  FbmFFE running on olserver52 ((irst line )
  • for some Slow frame builders running on the olserver52 and the related frame merger  FbmFE working with frames at 1s an running on th olserver52 too  (second line)
  • for the Imaging servers running on the rtpcs an building theirs frames at 1s, and  the related frame mergers  running on the olserver52 (third line)
  •  

The red rectangle refers to a full lost of data occurred between the 2024-10-14-10h56m38s and the 2024-10-14-10h57m25s. This attached file .txt  show for the TolmFrameBuilders and the Imaging servers the FdIO error report related to this event

After this event there is some jumps of latency (purple rectangle) on all the frame providers, TolmFrameBuilders, Slow frame buulders and Imaging servers (see the last attached plot)

Images attached to this report
Non-image files attached to this report
Comments to this report:
cortese - 16:56 Monday 21 October 2024 (65363) Print this report

The DAQ glitches can be explain by the fact that since 4 October the input flux to the fs01 nfs fileserver has increased again after the pause of September.

We have understood that the FdIOServer thread that sends frames via Cm is the same that writes logs on /virgoLog  (served by fs01) and therefore an overload of fs01 causes the whole thread to lag.

 

The increase of the flux (to some of the /virgoLog, /virgoData, /olusers or /virgoDev filesystems) happened again in correspondence of a restart of some virgo processes triggered by the unavailablity of the /data/archive area on 4 October.

From an analysis of the traffic on fs01 it turns out that the major writers are currently olserver119 and olserver114 in this order.

I see that on both server run some Fm processes which as far as I understand write on /virgoData , which stays indeed on fs01 , so explaining at least part of the overload.

It is not possible from the OS side to measure which of the processes on those 2 servers have a role because of the risk of impacting on the performances.

It should also be explained why the writing flux had been lower during all September.

Images attached to this comment
masserot - 17:28 Monday 21 October 2024 (65364) Print this report

As trial the FmRawBack server has been stopped at 2024-10-21-16h28-UTC

masserot - 9:21 Tuesday 22 October 2024 (65367) Print this report

The FmRawBack server has been stopped at 15h28m-UTC (red line in the following plots)

The attached plots shows

 

As new trial, the following servers were restarted

Images attached to this comment
Search Help
×

Warning

×