Reports 1-1 of 1 Clear search Modify search
AdV-DAQ (Data collection)
masserot, pacaud - 16:18 Monday 21 October 2024 (65361) Print this report
DAQ : latency jumps since the 2024-10-14-16h57m-UTC

The first attached plot show the latency

  • for some TolmFrameBuilders running on the rtpcs and building theirs frames at 0.2s, and  and the related frame merger  FbmFFE running on olserver52 ((irst line )
  • for some Slow frame builders running on the olserver52 and the related frame merger  FbmFE working with frames at 1s an running on th olserver52 too  (second line)
  • for the Imaging servers running on the rtpcs an building theirs frames at 1s, and  the related frame mergers  running on the olserver52 (third line)
  •  

The red rectangle refers to a full lost of data occurred between the 2024-10-14-10h56m38s and the 2024-10-14-10h57m25s. This attached file .txt  show for the TolmFrameBuilders and the Imaging servers the FdIO error report related to this event

After this event there is some jumps of latency (purple rectangle) on all the frame providers, TolmFrameBuilders, Slow frame buulders and Imaging servers (see the last attached plot)

Images attached to this report
Non-image files attached to this report
Comments to this report:
cortese - 16:56 Monday 21 October 2024 (65363) Print this report

The DAQ glitches can be explain by the fact that since 4 October the input flux to the fs01 nfs fileserver has increased again after the pause of September.

We have understood that the FdIOServer thread that sends frames via Cm is the same that writes logs on /virgoLog  (served by fs01) and therefore an overload of fs01 causes the whole thread to lag.

 

The increase of the flux (to some of the /virgoLog, /virgoData, /olusers or /virgoDev filesystems) happened again in correspondence of a restart of some virgo processes triggered by the unavailablity of the /data/archive area on 4 October.

From an analysis of the traffic on fs01 it turns out that the major writers are currently olserver119 and olserver114 in this order.

I see that on both server run some Fm processes which as far as I understand write on /virgoData , which stays indeed on fs01 , so explaining at least part of the overload.

It is not possible from the OS side to measure which of the processes on those 2 servers have a role because of the risk of impacting on the performances.

It should also be explained why the writing flux had been lower during all September.

Images attached to this comment
masserot - 17:28 Monday 21 October 2024 (65364) Print this report

As trial the FmRawBack server has been stopped at 2024-10-21-16h28-UTC

masserot - 9:21 Tuesday 22 October 2024 (65367) Print this report

The FmRawBack server has been stopped at 15h28m-UTC (red line in the following plots)

The attached plots shows

 

As new trial, the following servers were restarted

Images attached to this comment
masserot - 11:35 Tuesday 05 November 2024 (65460) Print this report
Images attached to this comment
masserot - 9:19 Thursday 14 November 2024 (65528) Print this report

Since the 2024-10-21-16h28-UTC , the FmRawBack server is stopped and none improvements were observed in the DAQ latency jumps 

The FmRawBack server, running on the olserver119 and providing the raw_bck.ffl file, has been restarted at  2024-10-05-07h52m34-UTC.

Images attached to this comment
masserot - 6:39 Saturday 16 November 2024 (65545) Print this report

The FFL monitoring of the raw_bck.ffl has been restored  on the  FFLMoni3 server the 2024-11-16-05h37m10-UTC

masserot - 16:59 Tuesday 26 November 2024 (65623) Print this report

The attached plots show the latency fluctuations for the TolmFrameBuilder servers running on the rtpc hosts  and the 2 first Frame Merging  servers running on the olserver52 host :

In the meantime the FmRawBack server( 2024-10-05-07h52m34-UTC ) and the FFL monitoring  server ( 2024-11-16-05h37m10-UTC ) have been restored without obvious improvement in the glitch rate reduction

The origin of these glitches is not yet understood and remains unexplained, but they have disappeared as they appeared

Images attached to this comment
kraja, cortese - 10:02 Friday 29 November 2024 (65657) Print this report

During the maintenance window on Nov 19th we have performed some upgrades on the filservers' underlaying infrastructure ( Entry 65561 ).
This action may have helped on reducing the number of glitches, since it consisted on the upgrade of the drivers for the network cards and changing the policy of access to the storage, with the purpose to improve the latency.
Please note that even this action is in coincidence with the decrease of the glitches, no actions on the infrastructure were performed when the glitches started to appear on 14th of October.

Search Help
×

Warning

×