gemma-3-27b-it-FP8 sometimes crashes

#90

by mondaylord - opened Aug 20

Aug 20

When I run this model with vllm v0.10.1, CUDA 12.8, it is successful at first.
But after about 30 minutes, it will crash with such error, but I can't reproduce it every time. Don't know where could be wrong. Please take a look.
RuntimeError: CUDA error: an illegal memory access was encountered. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. Error in chat completion stream generator.

mondaylord

Aug 20

My error is similar to this issue https://github.com/vllm-project/vllm/issues/21708, however, it works fine at first, only crashes after some time(like 30 minutes or so)

BalakrishnaCh

Google org Sep 2

Hi @mondaylord ,

Thanks for bringing this to our attention, please follow the followed suggestions to debug more deeper.

This "CUDA error: an illegal memory access was encountered" error is a common, The non-reproducibility suggests it's likely a race condition or a specific, rare sequence of events that corrupts the GPU's memory. This is not an issue with your model itself, but with how the VLLM framework is managing GPU resources under certain conditions.

Asynchronous Nature: The error message correctly points out that CUDA errors can be reported asynchronously. This means the actual crash happens at a later API call, not necessarily at the exact moment the memory corruption occurred. This is why the stack trace is often misleading.

CUDA_LAUNCH_BLOCKING=1: This is the single most important step. It forces the CPU to wait or each CUDA kernel to complete before launching the next one. While it will significantly slow down your application, it will make the stack trace much more accurate, pointing to the exact line of code where the illegal memory access occurred.

TORCH_USE_CUDA_DSA=1: This flag enables "device-side assertions," which can provide more informative error messages from within the CUDA kernels themselves.

Update VLLM and CUDA: pip install --upgrade vllm

Please try with above provided workaround, thank you so much for you patience.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment