Fix for dc-phased Pin Never Going HIGH with syncToRefClock="true"

I have the same issue as TAKUYA — the dc-phased pin never goes high. I managed to fix this using Claude AI and now the dc-phased pin goes high and the system seems stable. I have attached a detailed description of what was changed in the code as well as some screenshots from before and after. Note, this was done entirely with the help of AI and I am not qualified enough to say whether it is correct or not. Grandixximo, could you please review the notes and share your thoughts?


System Configuration

ComponentVersion/Details
OSDebian 13 (Trixie)
Kernel6.12.63+deb13-rt-amd64 (Preempt-RT)
LinuxCNC2.9.8 (uspace)
IgH EtherCAT Master1.6.8
linuxcnc-ethercatgrandixximo's fork
HardwareBeckhoff EK1100, EL1008, EL2008, 3× StepperOnline A6 servo drives
ethercat-conf.xmlsyncToRefClock="true" refClockSyncCycles="5"

Screenshots

Image 1 — BEFORE fix: HALscope
Image 2 — AFTER fix: HALscope
Image 3 — BEFORE fix: HALshow
Image 4 — AFTER fix: HALshow

Starting Position

After installing grandixximo's fork of linuxcnc-ethercat and configuring syncToRefClock="true" and refClockSyncCycles="5" as recommended, the lcec.0.dc-phased pin was always FALSE.

Initial symptoms:


Investigation

The investigation was done by tracing through the source code of lcec_main.c step by step.

Step 1 — Confirmed RTAPI_TASK_PLL_SUPPORT is present:

grep -r "RTAPI_TASK_PLL_SUPPORT" /usr/include/linuxcnc/
# Result: #define RTAPI_TASK_PLL_SUPPORT

Step 2 — Identified that pll-err was always 0:

With syncToRefClock="true", dc-phased goes TRUE only when:

int32_t lock_threshold = master->app_time_period / 10;  // 125,000 ns
if (abs(pll_err) < lock_threshold) {
    dc_phased = TRUE;
}

Since pll-err was always 0, the entire dc_time_valid block was never executing.

Step 3 — Traced why pll-err was always 0:

The code sets pll-err inside this block:

if (dc_time_valid && master->dc_time_valid_last) {
    *(hal_data->pll_err) = raw_offset + drift;
    ...
}

Since pll-err was always 0, dc_time_valid was always FALSE, meaning ecrt_master_reference_clock_time() was failing on every single cycle.

Step 4 — Found the root cause: wrong cycle timing

Looking at the call sequence in lcec_write_master():

ecrt_master_sync_monitor_queue(master->master);  // queues monitoring datagram
ecrt_master_send(master->master);                // sends it
// IMMEDIATELY after send in the SAME cycle:
uint32_t dc_time = 0;
int dc_time_valid = (ecrt_master_reference_clock_time(master->master, &dc_time) == 0);

The problem is clear: ecrt_master_reference_clock_time() was being called in the same cycle as ecrt_master_sync_monitor_queue() and ecrt_master_send(). The correct sequence requires two separate cycles:

Step 5 — Found a second issue: sync_monitor_queue was missing entirely

Before the fix, ecrt_master_sync_monitor_queue() was not being called at all. Without it, there is no datagram to receive, so ecrt_master_reference_clock_time() always returns -EIO.

Step 6 — Found a third issue: ecrt_master_sync_reference_clock skipped with syncToRefClock="true"

The original code only called ecrt_master_sync_reference_clock() when sync_to_ref_clock = false:

if (!master->sync_to_ref_clock) {
    ecrt_master_sync_reference_clock(master->master);
}

This meant with syncToRefClock="true", the reference clock was never being synchronized.


The Fix

Three changes were made:

1. Add dc_time and dc_time_valid fields to the master struct in lcec.h:

Inside the #ifdef RTAPI_TASK_PLL_SUPPORT block (around line 241), add:

uint32_t dc_time;              // DC reference clock time read in lcec_read_master
int dc_time_valid;             // Whether dc_time was successfully read this cycle

2. Move ecrt_master_reference_clock_time() to lcec_read_master() in lcec_main.c:

After ecrt_master_receive() and ecrt_domain_process() (around line 1139), add:

#ifdef RTAPI_TASK_PLL_SUPPORT
  // Read DC reference clock time here, after receive, so the monitor datagram
  // queued in the previous write cycle is already received
  master->dc_time = 0;
  master->dc_time_valid = (ecrt_master_reference_clock_time(
      master->master, &master->dc_time) == 0);
#endif

3. Update lcec_write_master() in lcec_main.c:

Fix the sync reference clock to run in both modes, add sync_monitor_queue, and use the struct fields for dc_time (around line 1221).

Replace:

  // sync ref clock to master
  if (!master->sync_to_ref_clock) {
    if (master->sync_ref_cnt == 0) {
      master->sync_ref_cnt = master->sync_ref_cycles;
      ecrt_master_sync_reference_clock(master->master);
    }
    master->sync_ref_cnt--;
  }
  // sync slaves to ref clock
  ecrt_master_sync_slave_clocks(master->master);
  // send domain data
  ecrt_master_send(master->master);

With:

  // sync ref clock to master (runs in both syncToRefClock modes)
  if (master->sync_ref_cnt == 0) {
    master->sync_ref_cnt = master->sync_ref_cycles;
    ecrt_master_sync_reference_clock(master->master);
  }
  master->sync_ref_cnt--;
  // sync slaves to ref clock
  ecrt_master_sync_slave_clocks(master->master);
  // queue DC monitoring datagram for reference clock time reads
  ecrt_master_sync_monitor_queue(master->master);
  // send domain data
  ecrt_master_send(master->master);

And replace the local dc_time variables (around line 1348):

Replace:

  // Read DC reference clock time (local variable)
  uint32_t dc_time = 0;
  int dc_time_valid = (ecrt_master_reference_clock_time(
      master->master, &dc_time) == 0);

With:

  // DC reference clock time was read in lcec_read_master after receive
  uint32_t dc_time = master->dc_time;
  int dc_time_valid = master->dc_time_valid;

Result After Fix

PinBeforeAfter
dc-phasedFALSETRUE ✅
pll-err0 (stuck)~15505 ns (actively tracking) ✅
pll-out01250 ns ✅
pll-final01250 ns ✅
app-phaserandom~114684 ns (stable) ✅