Hi everyone,
We’re prototyping a workflow where a NVIDIA DeepStream appliance does real-time AI detection, publishes the bounding-box metadata to Kafka, and Milestone XProtect keeps recording the video as usual. The goal is to see the AI boxes live in Smart Client (and in playback/export) without touching the video stream. Before we sink time into coding, we’d love a sanity-check on the design below.
Proposed architecture (one line):
Kafka ➜ small C# “Metadata Bridge” using MIP SDK ➜ Milestone Recording Server ➜ Smart Client overlay channel
Key questions we’re unsure about
• Is a lightweight C#/MIP bridge the right pattern, or should we push the metadata via an external ONVIF RTP stream instead?
• Any gotchas with timestamp conversion (epoch vs. UTC ticks), frame-rate mismatches, or added latency in Smart Client?
• Has anyone measured how many bounding boxes per second the Smart Client can render before operators notice UI lag?
• Are there licensing or tier requirements when injecting server-side metadata (we’re on XProtect Expert)?
• Anything else we’re missing—security, failover, maintenance headaches?
Why we think the bridge approach makes sense
• Proper time-sync: metadata follows Milestone’s clock, so playback is accurate.
• Overlays are stored with the recording—export retains graphics.
• No browser layers or client plug-ins; once at the server, all Smart Clients benefit.
• Re-uses VMS authentication/roles and keeps additional ports closed.
Would really appreciate any best-practice tips, horror stories, or performance numbers from devs who’ve done something similar. Thanks in advance!