NuFold Multimer

RNA-RNA complex structure prediction using diffusion-based deep learning

NuFold Multimer (Jan 2026 – Present)

Role: Lead Developer

The first dedicated deep-learning predictor for RNA–RNA complexes, extending the lab’s single-chain NuFold model to multi-chain inputs with a diffusion-based structure module and multi-sample conformational-ensemble generation.

Key Contributions

  • Extended the lab’s single-chain NuFold model to multi-chain inputs with a diffusion-based structure module and multi-sample conformational-ensemble generation
  • Built a large-scale model-distillation pipeline to overcome scarce ground truth: ran teacher-model inference (OpenFold3, Protenix) over ~20k RNA–RNA complexes with confidence-based filtering
  • Curated a multi-source training set (RNAInter, RISE, snoDB) with tiered confidence stratification; debugged cross-database label errors to produce clean supervision
  • Scaled training crops to ~1,000 tokens — beyond AlphaFold3’s 768 — on substantially fewer GPUs by integrating NVIDIA cuEquivariance kernels and a memory-efficient diffusion module
  • Benchmarked SOTA predictors (AlphaFold3, Boltz-2) on RNA–RNA complexes and showed none reliably handle them, establishing the need for a dedicated model

Technologies

  • PyTorch, Transformers, Diffusion models
  • Distributed multi-GPU training/inference
  • NVIDIA cuEquivariance GPU kernels
  • Model distillation from OpenFold3, Protenix

Project ongoing at Kihara Lab, Purdue University