Unable to Pull Kafka Broker for AI Bridge 1.6 from NGC

Using the AI Bridge 1.6 Helm chart, the Kafka Broker image refuses to pull. All other pods are coming through just fine. I noticed that the Kafka Broker, Init, and Proxy containers were all modified today (March 13, 2024) in NGC, so maybe that has something to do with it, but the Init and Proxy containers both pulled just fine.

Here’s part of what `helm describe pod` returns:

Normal   Pulling    24s (x2 over 50s)  kubelet            Pulling image "nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0"
  Warning  Failed     23s (x2 over 37s)  kubelet            Failed to pull image "nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0": failed to resolve reference "nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0": nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0: not found
  Warning  Failed     23s (x2 over 37s)  kubelet            Error: ErrImagePull
  Normal   BackOff    8s (x2 over 36s)   kubelet            Back-off pulling image "nvcr.io/isv-milestone/partners/aibridge-kafka-broker:v1.6.0"
  Warning  Failed     8s (x2 over 36s)   kubelet            Error: ImagePullBackOff

Of course this is preventing the AI Bridge from coming up.

I don’t know if this is an NGC issue or a Milestone issue, but browsing the corresponding private registry’s containers shows that the Kafka Broker v1.6.0 tag is available (or at least visible).

I’ve also tried rebooting the machine to double check things.

The Helm chart was working just fine yesterday (March 12, 2024).

Any help would be appreciated. Thank you in advance.

This looks to have been resolved since my post as I can now pull the image in question and AI Bridge finishes starting up in my K8s cluster.

Hello Duncan,

Sorry to read you are having problems when pulling these images.

We have taken care of them and you can pull them again as you did in the past, nothing needs to be changed.

We are tracking this problem down, to avoid these circumstances in the future.

Kind regards,

Fer

Hi Fernando,

Thank you for this, but just FYI, it’s happened again, this time with the AI Bridge Kafka Zookeeper image (v1.6.0). Can you please correct this?

I’m hoping you can track down the problem soon as it’s a very frustrating issue to keep running into.

(Also, I’m not sure if this is related, but the Kafka Broker image keeps crashing; i.e. I can’t start that pod either as it’s stuck in a CrashLoop.)

Thank you again.

Cheers,

Duncan

Hello Duncan,

We are really sorry we are facing this situation again. We will take the same course of action from the previous time to make them work again.

From NGC we have been informed that they are going through major updates and they may be causing issues. Hope they finish soon and we don’t face this again in the future.

I will let you know as soon as the issue is fixed, at least temporarely.

Regards,

Fer

Hello Duncan,

The issue with the images is now solved.

Hopefully after Eastern break we have a more stable NGC.

Regards,

Fer

Thank you, Fernando.