Assistance Required for ONVIF Metadata Conversion and Bounding Box Rendering Issue

005bH0000049BLJQA2 · March 17, 2025, 6:00pm

Hi there!

We are currently implementing an ONVIF-compliant video analytics solution (using Milestone AI Bridge) that involves object detection using our custom model and subsequent metadata conversion for integration with the camera’s metadata driver via a Kafka broker. However, we are encountering issues with the rendering of bounding boxes in the video frames.

Process Overview:

1. Object Detection:

The first step involves detecting objects using our model, producing detections in the following format:

```

on 0.0%: [{‘id’: ‘1’, ‘type’: ‘car’, ‘confidence’: 0.48, ‘xmin’: 318.0, ‘ymin’: 197.0, ‘xmax’: 339.0, ‘ymax’: 207.0, ‘timestamp’: ‘1742233178795’}]

```

2. ONVIF Metadata Conversion:

We convert these detections into ONVIF-compatible metadata using the `create_onvif_metadata` function. The function generates XML metadata using bounding box coordinates, center of gravity, object classification, and transformation scaling. The conversion function is structured as follows:

```

def create_onvif_metadata(self, timestamp, detections, frame_width, frame_height):

ns = "[http://www.onvif.org/ver10/schema](http://www.onvif.org/ver10/schema)"

ET.register\_namespace("tt", ns)

root = ET.Element("{%s}MetadataStream" % ns)

video\_analytics = ET.SubElement(root, "{%s}VideoAnalytics" % ns)

frame = ET.SubElement(video\_analytics, "{%s}Frame" % ns, 

          UtcTime=timestamp, 

          SourceStreamID=self.\_\_global\_manager.configs\["GENERAL\_DETAILS"\]\["UPLOAD\_PREDICTIONS"\]\["KAFKA"\]\["SOURCE\_STREAM\_ID"\],)

scale\_x = 1 / frame\_width

scale\_y = 1 / frame\_height

for detection in detections:

  [#obj](javascript:void\(0\); "#obj") = ET.SubElement(frame, "{%s}Object" % ns, ObjectId=detection\["id"\])

  obj = ET.SubElement(frame, "{%s}Object" % ns, ObjectId=detection\["id"\], Token=detection\["id"\])

  appearance = ET.SubElement(obj, "{%s}Appearance" % ns)

  transformation = ET.SubElement(appearance, "{%s}Transformation" % ns)

  ET.SubElement(transformation, "{%s}Translate" % ns, x="0.0", y="0.0")

  ET.SubElement(transformation, "{%s}Scale" % ns, x=f"{scale\_x:.6f}", y=f"{scale\_y:.6f}")

  shape = ET.SubElement(appearance, "{%s}Shape" % ns)

  ET.SubElement(shape, "{%s}BoundingBox" % ns,

   left=f"{detection\['xmin'\]}",

   top=f"{detection\['ymin'\]}",

   right=f"{detection\['xmax'\]}",

   bottom=f"{detection\['ymax'\]}")

  cx, cy = (detection\['xmin'\] + detection\['xmax'\]) / 2, (detection\['ymin'\] + detection\['ymax'\]) / 2

  cx \*= scale\_x

  cy \*= scale\_y

  ET.SubElement(shape, "{%s}CenterOfGravity" % ns, x=f"{cx:.6f}", y=f"{cy:.6f}")

  class\_elem = ET.SubElement(appearance, "{%s}Class" % ns)

  class\_candidate = ET.SubElement(class\_elem, "{%s}ClassCandidate" % ns)

  ET.SubElement(class\_candidate, "{%s}Type" % ns).text = detection\["type"\]

  ET.SubElement(class\_candidate, "{%s}Likelihood" % ns).text = f"{detection\['confidence'\]:.2f}"

  behaviour = ET.SubElement(obj, "{%s}Behaviour" % ns)

  ET.SubElement(behaviour, "{%s}Idle" % ns)

extension = ET.SubElement(root, "{%s}Extension" % ns)

ET.SubElement(extension, "OriginalData").text = "U29tZU9yaWdpbmFsRGF0YSBlbmNvZGVkIGluIEJBU0U2NA=="

return ET.tostring(root, encoding="utf-8").decode("utf-8")

```

3. Metadata Transmission:

The generated metadata is sent to the camera’s metadata driver via a Kafka broker.

Issue Encountered:

- The metadata transmission process is functioning correctly, and data is reaching the camera metadata driver as expected.

- However, the bounding boxes in the rendered video frames appear in incorrect positions, misaligned from the detected objects.

- We suspect that the issue might be related to the transformation scaling, bounding box normalization, or coordinate system used by the ONVIF metadata format.

Request for Assistance:

We would appreciate guidance on the following aspects:

1. Are there any known issues with how bounding boxes should be normalized for ONVIF compliance?

2. Are we missing any essential metadata elements required for correct rendering in the ONVIF metadata stream?

3. Could you confirm if the `BoundingBox`, `CenterOfGravity`, or `Scale` values require additional adjustments for compatibility with Milestone Smart Client or other ONVIF-compatible VMS systems?

We have attached an example of the incorrect bounding box rendering for reference.

Looking forward to your assistance.

Best regards,

João Brito | DeepNeuronic

005bH000004P7IXQA0 · March 18, 2025, 11:23am

Hi!

Reading your code, I noticed this snippet:

ET.SubElement(shape, "{%s}BoundingBox" % ns,
    left=f"{detection['xmin']}",
    top=f"{detection['ymin']}",
    right=f"{detection['xmax']}",
    bottom=f"{detection['ymax']}"
)

It seems to me that those coordinates are not normalized (-1.0 to 1.0 range), which is what the ONVIF standard suggests and we use on AI Bridge.

Could you try to adapt your code so the bounding box coordinates are normalized? Something like this may do the trick:

ET.SubElement(shape, "{%s}BoundingBox" % ns,
    left=f"{detection['xmin'] * scale_x}",
    top=f"{detection['ymin'] * scale_y}",
    right=f"{detection['xmax'] * scale_x}",
    bottom=f"{detection['ymax'] * scale_y}"
)

Looking to see if this solves your issue.

Thanks and regards!

005bH0000049BLJQA2 · March 18, 2025, 3:21pm

Hello José!

I truly appreciate your time and attention in helping me with this.

I have implemented the code you provided, but it only draws a single point at the center of the frame instead of the expected output.

Could this issue be related to the Scale, Translate, or the Center X and Center Y parameters?

I truly appreciate your time and attention in helping me with this.

Looking forward to your insights.

Best regards,

João Brito | DeepNeuronic