Hi there!
We are currently implementing an ONVIF-compliant video analytics solution (using Milestone AI Bridge) that involves object detection using our custom model and subsequent metadata conversion for integration with the camera’s metadata driver via a Kafka broker. However, we are encountering issues with the rendering of bounding boxes in the video frames.
Process Overview:
1. Object Detection:
- The first step involves detecting objects using our model, producing detections in the following format:
```
on 0.0%: [{‘id’: ‘1’, ‘type’: ‘car’, ‘confidence’: 0.48, ‘xmin’: 318.0, ‘ymin’: 197.0, ‘xmax’: 339.0, ‘ymax’: 207.0, ‘timestamp’: ‘1742233178795’}]
```
2. ONVIF Metadata Conversion:
- We convert these detections into ONVIF-compatible metadata using the `create_onvif_metadata` function. The function generates XML metadata using bounding box coordinates, center of gravity, object classification, and transformation scaling. The conversion function is structured as follows:
```
def create_onvif_metadata(self, timestamp, detections, frame_width, frame_height):
ns = "[http://www.onvif.org/ver10/schema](http://www.onvif.org/ver10/schema)"
ET.register\_namespace("tt", ns)
root = ET.Element("{%s}MetadataStream" % ns)
video\_analytics = ET.SubElement(root, "{%s}VideoAnalytics" % ns)
frame = ET.SubElement(video\_analytics, "{%s}Frame" % ns,
UtcTime=timestamp,
SourceStreamID=self.\_\_global\_manager.configs\["GENERAL\_DETAILS"\]\["UPLOAD\_PREDICTIONS"\]\["KAFKA"\]\["SOURCE\_STREAM\_ID"\],)
scale\_x = 1 / frame\_width
scale\_y = 1 / frame\_height
for detection in detections:
[#obj](javascript:void\(0\); "#obj") = ET.SubElement(frame, "{%s}Object" % ns, ObjectId=detection\["id"\])
obj = ET.SubElement(frame, "{%s}Object" % ns, ObjectId=detection\["id"\], Token=detection\["id"\])
appearance = ET.SubElement(obj, "{%s}Appearance" % ns)
transformation = ET.SubElement(appearance, "{%s}Transformation" % ns)
ET.SubElement(transformation, "{%s}Translate" % ns, x="0.0", y="0.0")
ET.SubElement(transformation, "{%s}Scale" % ns, x=f"{scale\_x:.6f}", y=f"{scale\_y:.6f}")
shape = ET.SubElement(appearance, "{%s}Shape" % ns)
ET.SubElement(shape, "{%s}BoundingBox" % ns,
left=f"{detection\['xmin'\]}",
top=f"{detection\['ymin'\]}",
right=f"{detection\['xmax'\]}",
bottom=f"{detection\['ymax'\]}")
cx, cy = (detection\['xmin'\] + detection\['xmax'\]) / 2, (detection\['ymin'\] + detection\['ymax'\]) / 2
cx \*= scale\_x
cy \*= scale\_y
ET.SubElement(shape, "{%s}CenterOfGravity" % ns, x=f"{cx:.6f}", y=f"{cy:.6f}")
class\_elem = ET.SubElement(appearance, "{%s}Class" % ns)
class\_candidate = ET.SubElement(class\_elem, "{%s}ClassCandidate" % ns)
ET.SubElement(class\_candidate, "{%s}Type" % ns).text = detection\["type"\]
ET.SubElement(class\_candidate, "{%s}Likelihood" % ns).text = f"{detection\['confidence'\]:.2f}"
behaviour = ET.SubElement(obj, "{%s}Behaviour" % ns)
ET.SubElement(behaviour, "{%s}Idle" % ns)
extension = ET.SubElement(root, "{%s}Extension" % ns)
ET.SubElement(extension, "OriginalData").text = "U29tZU9yaWdpbmFsRGF0YSBlbmNvZGVkIGluIEJBU0U2NA=="
return ET.tostring(root, encoding="utf-8").decode("utf-8")
```
3. Metadata Transmission:
- The generated metadata is sent to the camera’s metadata driver via a Kafka broker.
Issue Encountered:
- The metadata transmission process is functioning correctly, and data is reaching the camera metadata driver as expected.
- However, the bounding boxes in the rendered video frames appear in incorrect positions, misaligned from the detected objects.
- We suspect that the issue might be related to the transformation scaling, bounding box normalization, or coordinate system used by the ONVIF metadata format.
Request for Assistance:
We would appreciate guidance on the following aspects:
1. Are there any known issues with how bounding boxes should be normalized for ONVIF compliance?
2. Are we missing any essential metadata elements required for correct rendering in the ONVIF metadata stream?
3. Could you confirm if the `BoundingBox`, `CenterOfGravity`, or `Scale` values require additional adjustments for compatibility with Milestone Smart Client or other ONVIF-compatible VMS systems?
We have attached an example of the incorrect bounding box rendering for reference.
Looking forward to your assistance.
Best regards,
João Brito | DeepNeuronic


