Getting the actual playback start time from ONB RTSP after keyframe snapping?

Hey everyone,

I’m working on a custom frontend that overlays AI bounding boxes onto historical video. Our AI backend generates exact UTC detection timestamps down to the millisecond.

My current pipeline is: XProtect → Open Network Bridge → MediaMTX → React frontend (hls.js). The ONB has “Prefer absolute time” and “Skip gaps” checked.

Here’s the issue I’m hitting. When my frontend requests a specific historical time via RTSP (e.g., Range: clock=20260508T120253.812Z-), the ONB obviously has to snap to the nearest I-Frame. That part is totally expected. The problem is I have no way of knowing what time the ONB actually snapped to.

The RTSP PLAY response just echoes my requested time back to me instead of giving me the actual snapped time. And because MediaMTX sits in the middle and drops the 0xABAC ONVIF extensions from the RTP payload, my frontend is completely blind to the true starting timestamp of the video it receives. This makes syncing the bounding boxes mathematically impossible.

Right now, I’m forcing all my cameras to a 1-second keyframe interval to minimize this drift, but for storage reasons, I really need to move back to a standard 2-second interval. I don’t really care if the ONB snaps forward or backward to do this, I just need to know the exact time it snapped to so my frontend math works.

A few questions:

Can I get the exact timestamp of each frame, or at least the true recording start time of the session?

Is there a way to force the RTSP PLAY response to return the actual snapped time instead of just echoing the request?

Is there a REST API or ONVIF command where I can query the session to find out the absolute time of the first frame it served?

Assuming I can get this timestamp issue solved, is there any technical reason I shouldn’t go back to a 2-second keyframe interval? Or is a 1-second interval generally considered a hard requirement when dealing with frame-accurate AI metadata overlays?

Any advice or workarounds would be hugely appreciated. Thanks!

Q1: Can I get the exact timestamp of each frame, or at least the true recording start time of the session?
A:
The recommended approach for mapping session time to absolute UTC time is through the 0xABAC RTP header extension. Outside of this mechanism, there is currently no supported way to retrieve exact per-frame timestamps or the true recording start time directly via the ONB RTSP interface.


Q2: Is there a way to force the RTSP PLAY response to return the actual snapped time instead of just echoing the request?
A:
This is not supported. The RTSP PLAY request is handled as a forwarded seek command via the MIP SDK, which operates in a fire-and-forget manner. The response does not wait for frame delivery or alignment, and therefore does not return a recalculated or snapped timestamp.


Q3: Is there a REST API or ONVIF command where I can query the session to find out the absolute time of the first frame it served?
A:
There is currently no supported REST API or ONVIF command that exposes the absolute timestamp of the first delivered frame for a session.


Q4: Keyframe interval considerations (1s vs 2s)
A:
The GOP (keyframe) interval is typically a trade-off between compression efficiency and startup/seek latency. Shorter intervals (e.g., 1 second) improve time-to-first-frame and accuracy when aligning metadata, while longer intervals (e.g., 2 seconds) may be acceptable depending on latency requirements. This decision is generally use-case dependent, particularly for AI overlay accuracy versus bandwidth/storage considerations.