Hey everyone,
I’m working on a custom frontend that overlays AI bounding boxes onto historical video. Our AI backend generates exact UTC detection timestamps down to the millisecond.
My current pipeline is: XProtect → Open Network Bridge → MediaMTX → React frontend (hls.js). The ONB has “Prefer absolute time” and “Skip gaps” checked.
Here’s the issue I’m hitting. When my frontend requests a specific historical time via RTSP (e.g., Range: clock=20260508T120253.812Z-), the ONB obviously has to snap to the nearest I-Frame. That part is totally expected. The problem is I have no way of knowing what time the ONB actually snapped to.
The RTSP PLAY response just echoes my requested time back to me instead of giving me the actual snapped time. And because MediaMTX sits in the middle and drops the 0xABAC ONVIF extensions from the RTP payload, my frontend is completely blind to the true starting timestamp of the video it receives. This makes syncing the bounding boxes mathematically impossible.
Right now, I’m forcing all my cameras to a 1-second keyframe interval to minimize this drift, but for storage reasons, I really need to move back to a standard 2-second interval. I don’t really care if the ONB snaps forward or backward to do this, I just need to know the exact time it snapped to so my frontend math works.
A few questions:
Can I get the exact timestamp of each frame, or at least the true recording start time of the session?
Is there a way to force the RTSP PLAY response to return the actual snapped time instead of just echoing the request?
Is there a REST API or ONVIF command where I can query the session to find out the absolute time of the first frame it served?
Assuming I can get this timestamp issue solved, is there any technical reason I shouldn’t go back to a 2-second keyframe interval? Or is a 1-second interval generally considered a hard requirement when dealing with frame-accurate AI metadata overlays?
Any advice or workarounds would be hugely appreciated. Thanks!