Best way to ingest Kafka overlay metadata into XProtect Smart Client for real-time AI bounding-box drawing

Hi everyone,

We’re prototyping a workflow where a NVIDIA DeepStream appliance does real-time AI detection, publishes the bounding-box metadata to Kafka, and Milestone XProtect keeps recording the video as usual. The goal is to see the AI boxes live in Smart Client (and in playback/export) without touching the video stream. Before we sink time into coding, we’d love a sanity-check on the design below.

Proposed architecture (one line):

Kafka ➜ small C# “Metadata Bridge” using MIP SDK ➜ Milestone Recording Server ➜ Smart Client overlay channel

Key questions we’re unsure about

• Is a lightweight C#/MIP bridge the right pattern, or should we push the metadata via an external ONVIF RTP stream instead?

• Any gotchas with timestamp conversion (epoch vs. UTC ticks), frame-rate mismatches, or added latency in Smart Client?

• Has anyone measured how many bounding boxes per second the Smart Client can render before operators notice UI lag?

• Are there licensing or tier requirements when injecting server-side metadata (we’re on XProtect Expert)?

• Anything else we’re missing—security, failover, maintenance headaches?

Why we think the bridge approach makes sense

• Proper time-sync: metadata follows Milestone’s clock, so playback is accurate.

• Overlays are stored with the recording—export retains graphics.

• No browser layers or client plug-ins; once at the server, all Smart Clients benefit.

• Re-uses VMS authentication/roles and keeps additional ports closed.

Would really appreciate any best-practice tips, horror stories, or performance numbers from devs who’ve done something similar. Thanks in advance!

Hi, first, I would recommend using the built-in metadata features in XProtect that allows you to send in metadata to XProtect for live- and stored playback usage. If that metadata conforms to Onvif Profile M, XProtect can display the bounding boxes in live and playback (based on common timestamps) and also be searched based on object classification.

  • There is no special licensing for sending metadata to XProtect
  • If metadata arrives earlier to the Smart Client than the corresponding live video (based on timestamps), the metadata will be buffered accordingly to be rendered synchronously

There are different ways to send the metadata to XProtect, and since you mention DeepStream and Kafka, the Milestone AI Bridge might be the right option for you:

  • Is a fully Linux based containerized solution.
  • Available from AWS ECR: https://gallery.ecr.aws/milestonesys/aibridge
  • Connects to XProtect to:
    • Expose the VMS configuration data through a GraphQL endpoint.
    • Expose video feed using RTSP or gRPC.
    • Feed XProtect with:
      • Analytics Events (using REST, Kafka or gRPC)
      • Metadata (using REST, Kafka or gRPC) following ONVIF or DeepStream minimal formats.
      • Video (using REST or gRPC) using h.264, h.265 or MJPEG codecs.
  • Docker compose can be used to deploy AI Bridge as well, you will need to get the files from the AI Bridge resources zip file, from here: https://www.milestonesys.com/my-milestone/download-software/?prod=2128&type=11&lang=27 where you also find the Developer Reference.

Thanks, I will get back if I have any doubts

Hi! I noticed this post about ingesting bounding box metadata from nvidia > Kafka.

My company is in a very similar need. We currently use a 3rd platform that ingest our camera stream and outputs real time AI detection with bounding boxes. And were also looking into something we can use to make xprotect ingest that metadata.

I was just wondering did that solution that was offered to you, work out? The AI bridge?

Were there any issues with time delay? was there any gotchas that were not realized till later?