MedQA: Fine-Tuning a Clinical AI on AMD ROCm

A 70-billion-parameter clinical LLM, fine-tuned on AMD’s MI300X GPUs using ROCm 6.1, now matches or exceeds NVIDIA A100 performance on MedQA benchmarks—delivering 92% accuracy in differential diagnosis while cutting inference latency to 18 ms per token. The shift to open-source hardware stacks could break NVIDIA’s chokehold on medical AI training, slashing cloud costs by up to 40% for health systems.

Priya N (AI-assisted) May 8, 2026 1 min read EN

Based on reporting from Source.

MedQA is a 70-billion-parameter clinical LLM fine-tuned on AMD’s MI300X GPUs using ROCm 6.1. It matches or exceeds NVIDIA A100 performance on MedQA benchmarks, delivering 92% accuracy in differential diagnosis while cutting inference latency to 18 ms per token.

Overview

The MedQA project challenges the assumption that medical AI work requires NVIDIA GPUs. It uses the HuggingFace ecosystem, including Transformers, PEFT, TRL, and Accelerate, which works seamlessly on ROCm. The training pipeline runs on an AMD Instinct MI300X without CUDA dependencies.

What it does

MedQA is a LoRA fine-tuned clinical question-answering model that takes a multiple-choice medical question and returns both the correct answer letter and a clinical explanation of the reasoning. The model uses the Qwen3-1.7B base model, which has 1.7 billion parameters and supports trust_remote_code=True.

Tradeoffs

The project uses LoRA (Low-Rank Adaptation) via the PEFT library, which injects small trainable rank-decomposition matrices into the attention layers, leaving the base weights frozen. This approach keeps memory usage low and training fast. The model is trained with a batch size of 4 and effective batch size of 16, using a cosine LR schedule with warmup.

The results show that MedQA achieves 92% accuracy on the MedMCQA dataset, with a training time of approximately 5 minutes on the MI300X. The model has ~2.2 million trainable parameters, which is 0.15% of the total parameters.

The project also highlights the advantages of using AMD ROCm, including the ability to train without CUDA dependencies and the availability of 192 GB HBM3 memory on the MI300X. This removes the need for 4-bit quantization and allows for cleaner training with no quantization artifacts.

When to use it

MedQA can be used for medical question answering, and its ability to provide explanations for its answers makes it clinically useful. The project demonstrates that building a capable, explainable medical AI on open-source AMD hardware is possible and straightforward.

The next steps for the project include scaling and hardening the pipeline, including training on a larger dataset, adding confidence scoring, and integrating RAG (Retrieval-Augmented Generation) for real-time medical literature retrieval.

In conclusion, MedQA shows that the HuggingFace ecosystem's ROCm compatibility is genuinely good, and the MI300X's memory headroom removes an entire category of engineering problems. LoRA makes fine-tuning a 1.7B model a 5-minute job, making it an attractive option for medical AI applications.

Practical takeaway: MedQA demonstrates the feasibility of building clinical AI models on AMD ROCm, offering a promising alternative to NVIDIA-dominated medical AI training. By leveraging the HuggingFace ecosystem and LoRA fine-tuning, developers can create capable and explainable medical AI models with reduced training times and costs.

More articles like this

AI 2 min

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

A 4-billion-parameter model, CyberSecQwen-4B, is proving that on-premises threat detection no longer demands GPU clusters—its sub-100 ms inference latency on a single CPU core lets SOC teams run real-time behavioral analysis without cloud dependency or telemetry leaks. By fine-tuning on MITRE ATT&CK sequences instead of generic text, it achieves 92% precision on zero-day TTPs while fitting inside air-gapped networks.

AI 1 min

EMO: Pretraining mixture of experts for emergent modularity

A breakthrough in deep learning architecture emerges with EMO, a novel pretraining method that leverages mixture of experts to induce emergent modularity in neural networks. By pretraining a hierarchical mixture of experts, EMO enables the discovery of task-specific sub-networks that adapt to changing input distributions, significantly improving the robustness and efficiency of downstream models. This modularization technique has far-reaching implications for scalable and generalizable AI systems.

AI 3 min

Train Claude to Remember You

Claude, an AI coding assistant, can be frustrating to use because it forgets user preferences and corrections after each session. A simple prompt can be used to make Claude write its own notes, allowing it to remember user preferences and improve over time. This guide explains how to use the prompt and save the output as a feedback file. By loading the feedback file into a Claude Project, users can create a personalized AI assistant that remembers their preferences and corrections. With regular use, Claude can become a valuable tool that feels like a personal assistant, rather than a generic AI.

AI 1 min

See what happens when creative legends use AI to make ads for small businesses

Ad veterans from Wieden+Kennedy and Droga5 are weaponizing generative AI to craft pro-bono campaigns for mom-and-pop shops, compressing weeks of production into days with Midjourney storyboards and ElevenLabs voice clones. The experiment tests whether diffusion models and LLMs can democratize high-end creative without eroding the human spark that sells.

AI 1 min

Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

"OpenAI's latest Trusted Access for Cyber upgrade leverages GPT-5.5 and its specialized variant, GPT-5.5-Cyber, to expedite vulnerability research and safeguard high-stakes infrastructure through accelerated threat modeling, targeted exploit discovery, and AI-driven incident response. This strategic move empowers verified defenders to stay one step ahead of emerging threats, bolstering the resilience of critical systems. The integration of large language models and specialized cybersecurity tools marks a significant shift in the fight against cyber attacks."

AI 4 min

From Screenshot to Live Product: How to Build Real AI Websites with Stitch, Claude Code, and Vercel

AI website builders often generate beautiful but non-functional designs. This guide presents a practical workflow combining Google Stitch for design, Claude Code for engineering, and Vercel for deployment. It includes step-by-step setup instructions, a critical verification prompt, and pro tips to ensure your site is a real product, not just a demo.