-
Notifications
You must be signed in to change notification settings - Fork 428
Add peer-based time synchronization for nodes without hardware RTC #1443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…rts sent by peers. PeerSyncRTCClock is an intelligent time synchronization wrapper that enables MeshCore nodes without hardware RTC modules to maintain accurate time by synchronizing with nearby peers' advertisement packet timestamps. This is particularly critical for repeater nodes and low-cost variants that lack dedicated RTC hardware but still need accurate timekeeping for message timestamps, telemetry data, and network coordination. When a hardware RTC is detected via I2C auto-discovery during initialization, it takes absolute priority and peer sync is completely disabled; otherwise, the system automatically activates peer-based synchronization as a fallback. The synchronization process collects timestamps from received advertisement packets and compensates for multi-hop transmission delay by adding the estimated total airtime calculated as (hop_count + 1) × airtime_per_hop_ms, where airtime is dynamically computed from current radio settings (spreading factor, bandwidth, packet length). This airtime compensation is crucial because it avoids circular logic—we cannot use our potentially-incorrect local clock to measure elapsed time when trying to fix that clock, so instead we adjust timestamps forward by the estimated transmission duration before storing them. The system then applies robust statistical outlier filtering using MAD (Median Absolute Deviation) with a 3×MAD threshold to reject malicious or incorrect timestamps, requiring at least 70% consensus (15 of 21 samples by default) before trusting the result. After filtering, the system calculates a weighted median where closer peers receive exponentially higher influence based on hop count. A 1-hop peer gets weight 20 while a 20-hop peer gets weight 1, ensuring that nearby trusted sources dominate the consensus rather than distant or potentially compromised nodes. The system uses adaptive validation that's lenient (May 2024 to May 2034 range) before the first successful sync to bootstrap from a cold start but becomes strict (±24 hours) afterward to prevent drift. Once the clock is accurate (offset less than 2 minutes), peer sync automatically pauses for a configurable duration (default 24 hours) to reduce CPU overhead, resuming periodically to verify continued accuracy. All timing operations use RTC timestamps rather than millis(), making the system fully compatible with deep sleep modes common in battery-powered repeaters and sensor nodes. This represents a fundamental improvement over the previous system, which had no peer-based synchronization capability whatsoever—nodes and repeaters without hardware RTC would drift indefinitely with no mechanism to self-correct, rendering timestamps on messages, telemetry readings, and log entries increasingly meaningless over time. Now these variants can automatically maintain time accuracy within seconds across the entire mesh through distributed consensus, being robust against both accidental clock errors and intentional time-spoofing attacks through statistical validation and proximity-based trust weighting.
|
|
||
| MESH_DEBUG_PRINTLN("RadioLibWrapper: noise_floor = %d", (int)_noise_floor); | ||
| // Only log if noise floor changed | ||
| if (_noise_floor != old_noise_floor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log entry is rather chatty, this will now only log whenever the noise floor changes.
| @@ -0,0 +1,524 @@ | |||
| /* | |||
| * PEER-BASED TIME SYNCHRONIZATION SYSTEM | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The peer based time synchronization logic is also explained here in detail in in-line documentation.
| #include <helpers/AutoDiscoverRTCClock.h> | ||
|
|
||
| /** | ||
| * \brief Helper macro to setup RTC clock with automatic peer synchronization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new helper macro to simplify RTC use in the various hardware variants.
|
Thanks for trying to solve clock sync. I’m not convinced that this relatively minor issue warrants the additional complexity introduced by this PR. A simpler and safer approach might be to allow repeater owners to configure a whitelist of nodes whose adverts they trust? There’s also a non-trivial risk of clock poisoning when relying on anonymous advert timestamps, and a future dated clock can be a more annoying problem for ESP32 repeaters which can't just restart to reset their clock. |
Context
Problem statement
After cold start, repeaters without Real Time Clock (RTC) module will by default be set to May 15, 2024 10:52:31 UTC (the initial base date that's used throughout the firmware, e.g.
1715770351). Often, the time on these repeaters is then not synchronized properly and the time stays invalid for long periods of time.In more detail
PeerSyncRTCClockis an intelligent time synchronization wrapper that enables MeshCore nodes without hardware RTC modules to maintain accurate time by synchronizing with nearby peers' advertisement packet timestamps. This is particularly critical for repeater nodes and low-cost variants that lack dedicated RTC hardware but still need accurate timekeeping for message timestamps, telemetry data, and network coordination. When a hardware RTC is detected via I2C auto-discovery during initialization, it takes absolute priority and peer sync is completely disabled; otherwise, the system automatically activates peer-based synchronization as a fallback.The synchronization process collects timestamps from received advertisement packets and compensates for multi-hop transmission delay by adding the estimated total airtime calculated as
(hop_count + 1) × airtime_per_hop_ms, where airtime is dynamically computed from current radio settings (spreading factor, bandwidth, packet length). This airtime compensation is crucial because it avoids circular logic—we cannot use our potentially-incorrect local clock to measure elapsed time when trying to fix that clock, so instead we adjust timestamps forward by the estimated transmission duration before storing them. The system then applies robust statistical outlier filtering using MAD (Median Absolute Deviation) with a 3×MAD threshold to reject malicious or incorrect timestamps, requiring at least 70% consensus (15 of 21 samples by default) before trusting the result.After filtering, the system calculates a weighted median where closer peers receive exponentially higher influence based on hop count. A 1-hop peer gets weight 20 while a 20-hop peer gets weight 1, ensuring that nearby trusted sources dominate the consensus rather than distant or potentially compromised nodes. The system uses adaptive validation that's lenient (May 2024 to May 2034 range) before the first successful sync to bootstrap from a cold start. It becomes strict (±24 hours) afterward to prevent drift. Once the clock is accurate (offset less than 2 minutes), peer sync automatically pauses for a configurable duration (default 24 hours) to reduce CPU overhead, resuming periodically to verify continued accuracy. All timing operations use RTC timestamps rather than
millis(), making the system fully compatible with deep sleep modes (for example, see #1353).This represents a fundamental improvement over the previous system, which had no peer-based synchronization capability whatsoever—nodes and repeaters without hardware RTC would drift indefinitely with no mechanism to self-correct, rendering timestamps on messages, telemetry readings, and log entries increasingly meaningless over time. Now these variants can automatically maintain time accuracy within seconds across the entire mesh through distributed consensus, being robust against both accidental clock errors and intentional time-spoofing attacks through statistical validation and proximity-based trust weighting.
Features
Priority System:
Timestamp Collection:
Statistical Outlier Filtering + Weighted Median:
MAX_HOP_COUNT+ 1 - actual_hop_count)Clock Update Logic:
Configuration Options:
PEER_SYNC_MAX_HOP_COUNT: Maximum hops for time sync (default: 20)PEER_SYNC_MIN_OFFSET_SECONDS: Minimum offset to trigger sync (default: 120s / 2 min)PEER_SYNC_SAMPLE_SIZE:Timestamps to collect before outlier filtering and median timestamp determination (default: 21)PEER_SYNC_MIN_SAMPLES_AFTER_FILTERING: Minimum good samples after outlier filtering (default: 15)PEER_SYNC_MIN_SYNCS_BEFORE_STRICT_VALIDATION: Successful syncs before 24h validation (default: 1)PEER_SYNC_PAUSE_DURATION_SECONDS: Pause duration when clock is accurate (default: 86400s / 24h)Scope
I have implemented the changes for the RAK4631 variant to make it easier to review. I'll add the changes for the other variants when this has been reviewed.
Relevant issues
Disclaimer
I have used Claude Code to assist with this implementation.
Examples
Cold start (the 1st clock sync)
After initial cold start, and an initial advert is heard:
After we have 20 samples since cold start:
After the 21st sample (e.g.
PEER_SYNC_SAMPLE_SIZE), when there is a pool of timestamps that's large enough to reliably determine a median timestamp, outlier detection will discard the samples that deviate too much:In this example, MAD (Median Absolute Deviation) based outlier detection discarded 10 outliers, resulting in a remaining pool of 11 samples. As this is lower than 15 (e.g.
PEER_SYNC_MIN_SAMPLES_AFTER_FILTERING) collection continues until we again have a pool of 21 samples (e.g.PEER_SYNC_SAMPLE_SIZE):This time, the outlier detection only removed a single outlier, resulting in a remaining pool of 20 timestamps. As this is higher than 15 (e.g.
PEER_SYNC_MIN_SAMPLES_AFTER_FILTERING), the pool will be used to determine the weighted median timestamp and synchronize the clock.In this particular case the peer-synced clock was about 40 seconds off compared to the actual UTC time.
After the first clock sync
The first clock sync after cold start is quite lenient as it will accept a time between May 15, 2024 10:52:31 UTC (the initial base date that's used throughout the firmware:
1715770351) and ~May 2034 (base + 10 years).After the first clock sync, the Peer based synchronization logic becomes more strict and will only accept timestamp that lie within a 24h difference.
After again 21 samples are collected (e.g.
PEER_SYNC_SAMPLE_SIZE), the computed drift is now much smaller:The weighted median for this run determined that the clock offset is now 3 seconds. As this is lower than the threshold of 120 seconds (e.g.
PEER_SYNC_MIN_OFFSET_SECONDS), the clock is not updated.To reduce CPU cycles, peer based clock synchronization is now paused for 24 hours (e.g.
PEER_SYNC_PAUSE_DURATION_SECONDS).