AI Bridge - Unable to run for a long period

Hi there,

I’m having an issue with getting my AI bridge to work for longer than about 5 days continuously.

I’m running version 2.0.4 with Helm, and everything starts out OK, but eventually it looks like Kafka / ZooKeeper / Fuseki restarts, and since there’s no persistent state, things break.

I have also triggered it in the past with a host restart.

I might be missing some important documentation, but I’m following https://doc.milestonesys.com/AIB/PDF/v2_0_5/en-US/MilestoneAIBridge_IntegratorManual_en-US.pdf and customising what I see in `values.yaml`.

I’m about to embark on installing v2.0.5, but this is frustrating. It doesn’t get picked up by health checks, and it just continually complains about topics not existing. The only way I’ve found to rectify this reliably is to re-install the Helm chart.

I was thinking this might have something to do with persistence. In the documentation, there’s a brief config file mentioning this under `kafka`:

  broker:
    replicaCount: 1
    persistence:
      enabled: false
  controller:
    replicaCount: 1
    controllerOnly: true
    persistence:
      enabled: false

My full `values.yaml` looks like:

bridge:
  description: Kubernetes cluster running Milestone AI Bridge
  id: XXXXXXXX
  name: Milestone AI Bridge Cluster
  webpage: ''
gateway:
  id: XXXXXXX
  version: 1.0.0
general:
  debug: true
  externalHostname: XXXX
  externalIP: X.X.X.X
  externalRootPath: /processing-server
  masterKey: encryptionKeyExample
  tag: 2.0.4
ingress-nginx:
  controller:
    service:
      externalIPs:
        - X.X.X.X
  enabled: true
kafka:
  logRetentionMs: 300000
  broker:
    persistence:
      enabled: true
  controller:
    persistence:
      enabled: true
replicas:
  broker: 1
  connector: 1
  health: 1
  proxy: 1
  streaming: 1
  webservice: 1
vms:
  url: http://milestone

I’ve set these persistence values to true, but I don’t see anything actually referencing this in the helm chart itself?

The logs (after it all goes pear shaped) are below. I unfortunately don’t have logs exactly when it dies, as it’s usually overnight, and they spam the same message every 5 seconds.

aib-aibridge-broker:

2026-04-19T23:47:33Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:47:38Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:47:43Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:47:48Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:47:53Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:47:58Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:48:03Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

2026-04-19T23:48:08Z Topic voyager.topics.daim.vmsbridge.stream_subscriptions does not yet exist; will wait 5 seconds and check again

aib-aibridge-connector:

2026-04-20T23:47:34Z Topics [voyager.topics.daim.gateway.config_requests, voyager.topics.daim.gateway.config_responses, voyager.topics.daim.gateway.stream_requests, voyager.topics.daim.gateway.stream_responses, voyager.topics.daim.gateway.endpoint_status] do not yet exist; will wait 5 seconds and check again

2026-04-20T23:47:39Z Topics [voyager.topics.daim.gateway.config_requests, voyager.topics.daim.gateway.config_responses, voyager.topics.daim.gateway.stream_requests, voyager.topics.daim.gateway.stream_responses, voyager.topics.daim.gateway.endpoint_status] do not yet exist; will wait 5 seconds and check again

2026-04-20T23:47:44Z Topics [voyager.topics.daim.gateway.config_requests, voyager.topics.daim.gateway.config_responses, voyager.topics.daim.gateway.stream_requests, voyager.topics.daim.gateway.stream_responses, voyager.topics.daim.gateway.endpoint_status] do not yet exist; will wait 5 seconds and check again

2026-04-20T23:47:49Z Topics [voyager.topics.daim.gateway.config_requests, voyager.topics.daim.gateway.config_responses, voyager.topics.daim.gateway.stream_requests, voyager.topics.daim.gateway.stream_responses, voyager.topics.daim.gateway.endpoint_status] do not yet exist; will wait 5 seconds and check again

aib-aibridge-fuseki: No logs

aib-aibridge-health: No logs

aib-aibridge-kafka-broker:

[2026-04-20 23:22:07,933] INFO [Controller id=1001] Processing automatic preferred replica leader election (kafka.controller.KafkaController)

[2026-04-20 23:22:07,933] TRACE [Controller id=1001] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)

[2026-04-20 23:27:07,933] INFO [Controller id=1001] Processing automatic preferred replica leader election (kafka.controller.KafkaController)

[2026-04-20 23:27:07,933] TRACE [Controller id=1001] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)

[2026-04-20 23:32:07,933] INFO [Controller id=1001] Processing automatic preferred replica leader election (kafka.controller.KafkaController)

[2026-04-20 23:32:07,933] TRACE [Controller id=1001] Checking need to trigger auto leader balancing (kafka.controller.KafkaController)

aib-aibridge-kafka-zookeeper: No logs

aib-aibridge-proxy:

2026-04-20T23:48:09Z Topics [voyager.topics.daim.vmsbridge.stream_subscriptions, voyager.topics.daim.gateway.config_responses] do not yet exist; will wait 5 seconds and check again

2026-04-20T23:48:14Z Topics [voyager.topics.daim.vmsbridge.stream_subscriptions, voyager.topics.daim.gateway.config_responses] do not yet exist; will wait 5 seconds and check again

2026-04-20T23:48:19Z Topics [voyager.topics.daim.vmsbridge.stream_subscriptions, voyager.topics.daim.gateway.config_responses] do not yet exist; will wait 5 seconds and check again

aib-aibridge-webservice:

Not all topics available yet (will retry in 5 seconds)

Not all topics available yet (will retry in 5 seconds)

Not all topics available yet (will retry in 5 seconds)

Not all topics available yet (will retry in 5 seconds)

Not all topics available yet (will retry in 5 seconds)

aib-ingress-nginx-controller: No logs

Is there something I can do to stop this happening. It happens every 4-7 days in my installation. Happy to troubleshoot further.

1 Like

Hi @bevans_vc ,

Is this something that you also see on ‘docker compose’ installations ?

As you have pointed out, we are not providing persistence on kafka/fuseki and that’s why when the node gets restarted the data is not present anymore.

This behaviour doesn’t happen on ‘docker compose’ setups because we basically get the same container we had before restarting. In the case of k8s the pod gets created from scratch.

We are now adjusting the AI Bridge templates to include some storageclass patterns so this can be avoided on single and multi-node clusters.

This will be released in the next version of AI Bridge (3.0.0)

Thanks for reporting this issue.

Amazing. Thank you.