The disappearing of the data from SDB1 suspension is one event of a long list of 'similar' problems occured on suspension electronic in the last period. For most of them, the common source appeared to be a bad data received from outside. In some cases, the trigger was known. In some other, like the events occurred on SDB1, PR, WI last week, they were considered simply side effect of the break for the installations, and the consequent increased activity on DAQ.
This is the point of view of the user, ignorant of electronics, which recovered some of those pathological situation removing NAN by a download. A deeper analysis would be recommended.
As the SDB2 tower was opened , glitches occured on the 100MHz Timing channel used to slave the SDB2 Daq boxes triggering the timing errors . To recover the running conditions, timing error at zero, the SDB2 Daq boxes were reconfigured at 2021-05-04-15h52m20-UTC and the rtpc5 DAC driver servers restarted
In my previous comment I wrote 'bad data received from outside'. Again, for this specific event this mechanism is not demonstrated (it was for other events, similar in the final effect). But it is possible, given the architecture of the suspension controls. SDB1 top stage control is involved in the complex network of Global Inverted Pendulum Control. It receives data 'from outside', namely PR top stage master board via TOLM system. PR receives data from several boards, and in particular data which are connected to global control output towards suspensions (GIPC is the control architecture which put the locking force in loop for top stage controls).
If a NAN enters a board in one position of this network, it is able in principle to reach boards quite far away, even if GIPC is not turned on, because a switch at zero is not able to stop it (NAN*0=NAN, in our codes). This is not simply theoretically possible: it happened several times.
Now I must say again that there is no evidence that this is the mechanism that put SDB1 suspension control board in a state that required a reboot. I'm just saying that, if somebody want to investigate on that event, the bad data received from outside has to be considered a trail to follow.
The SUSP_SBE_LC server propagate some Sa channels to some SBE controls (SNEB,SWEB and SDB2) . Here the logfie content of this server when
the Sa_OB data were lost the 2021-04-27-03h33m44-UTC>WARNING-AcAdcChCheck> Sa_OB_F0_Z - start delayed or missing at GPS1303529641-970704850
the Sa_OB data were back the 2021-04-28-14h24m08-UTC>WARNING-AcAdcChCheck> Sa_OB_F0_Z - Sa_OB_F0_Z delayed or missing from GPS1303529641-970704850 for more or less 125424(s) - nLoop 1254230294@10000Hz
The following plots shows the LSC_{PR,BS,NI,WI,NE,WE}_CORR channels sent by the LSC_Acl server to each ITF towers and the forwarded one, if any, as SC_{(PR,BS,NE,WE} _MIR_LSC_CORR . The same data sent to the Sc DSP is sent to the DAQ too .
During these events the ITF was unlocked , as consequence the corrections were at zero and the Sc DSP forwarded corrections remains at zero : so no NAN was forwarded by the Sc DSPs to the DAQ
It could be useful to analyse also another event, occurred on April 28, which affected PR and WI top stage control at the same time. In this case the data were not permanently lost as for SDB1, so it could be something totally different. In that case, PR and WI loops stopped working at the same time, and their re-activation required the removal of a NAN from some variable. The source is not known, but I think we can say that, at least for one of the two, the problem arrived from outside. Which doesn't mean 'global control' (LSC_CORR appeared to be zero also in that case), or DAQ. Thinking about the data distribution was a simple answer I gave to me, given that something similar (in the effect) happened days before, in coincidence with a timing problem triggered by an ADC failure at WE (in that case Alain gave in advance a warning about possible drawbacks on GIPC data). The second hint was the fact that a DAQ activity was ongoing (according to the operator).
I'm not doing any diagnosis: just giving information, which I forgot to write in the logbook in the right moment.