Rajeev Jain
Rajeev Jain
Principal Software Engineer · ML Infrastructure · HPC · Scientific Computing
I design and engineer production software systems at the intersection of scientific computing, ML infrastructure, and high-performance computing at Argonne National Laboratory, with a joint appointment at the University of Chicago. My focus is the gap between prototype and production: parallel training on new accelerator hardware, I/O that doesn’t bottleneck at scale, Python platforms that teams can actually maintain.

Work
- UXarrayLead developer · open-source climate analysisPython library for unstructured climate grid analysis — the standard tool for DOE labs, NCAR, and universities working with MPAS, ICON, SAM, and next-generation meshes. Conservative zonal averaging via Gauss-Legendre quadrature; grid I/O for ESMF, MPAS, SCRIP, and HEALPix; MCP server for AI-agent dataset exploration across local and HPC execution. Docs · GitHub · MCP article
- Pangu-Weather on Aurora60,000+ Intel GPUs · Argonne Leadership Computing FacilityPyTorch reimplementation of Pangu-Weather using the Spectral Fourier Neural Operator for DOE exascale Earth system modeling. First stable portable DDP baseline on Aurora: PMIX/PALS environment mapping, XPU/CUDA device branching, device-aware mixed precision with gradient scaling on CUDA and bf16 on Intel XPU. Article
- CANDLE / IMPROVECore contributor · R&D 100 Award 2023HPO and benchmarking infrastructure for cancer drug response models — 15+ researchers across Argonne, LLNL, and ORNL. 10,000+ training experiments across Summit, Theta, and Cori using Parsl and Swift/T. Published in Briefings in Bioinformatics, 2025.
- FLASH-XI/O and compression lead · R&D 100 Award 2022Checkpoint and restart redesign for a million-line multiphysics engine. Async HDF5 with Argobots plus SZ3/ZFP compression: 40–70% checkpoint overhead reduction and 50%+ storage savings on Summit. Cross-checkpoint restart between AMReX and Paramesh — removing a hard constraint that forced full restarts when switching solvers. SC24 paper
- MeshKitPI and software lead · DOE NEAMS · 2009–2016Open-source C++ toolkit for automated nuclear reactor core mesh generation. Parallel CoreGen: 712 processors, 101 million hexahedral elements, 14 GB MONJU reactor mesh in under 7 minutes — a job the serial path couldn’t run at all. Blog post · Source
Selected papers
- Partin, A., ..., Jain, R., et al. Benchmarking community drug response prediction models. Briefings in Bioinformatics, 2025.
- Jain, R., Tang, H., Dhruv, A., Byna, S. Enabling Data Reduction for FLASH-X Simulations. DRBSD-10 Workshop, SC24, 2024.
- Jain, R., Wozniak, J.M., Partin, A., et al. Cross-HPO: Optimizing Neural Networks for Cancer Drug Response. CAFCW24, SC24, 2024.
- Wozniak, J.M., ..., Jain, R., et al. CANDLE/Supervisor: A workflow framework for machine learning applied to cancer research. BMC Bioinformatics, 2018.
- Tautges, T.J., Jain, R. Creating Geometry and Mesh Models for Nuclear Reactor Core Geometries. Engineering with Computers, 2011.
Full list on Google Scholar · 22+ publications
Recent talks
Writing
- 2026 UXarray MCP Server: AI-Agent Dataset Exploration with Globus Compute How the UXarray MCP server lets AI agents explore, analyze, and visualize unstructured climate grids — locally and on HPC via Globus Compute.
- 2026 Pangu-Weather on Aurora: Porting a Weather Foundation Model to 60,000 Intel GPUs Device abstraction, DDP setup, PMIX/PALS environment mapping, and mixed-precision on Intel XPU to get a stable training baseline on Aurora.
- 2026 IMPROVE: Building Rigorous Benchmark Infrastructure for Cancer Drug Response Prediction The improvelib package, cross-study analysis framework, GitHub Actions CI/CD, and the UNO dual-branch neural network for drug response prediction.
- 2026 Probing Cancer Model Decision Boundaries: Counterfactual Analysis and Large-Scale HPO mlrMBO, DEAP, Hyperopt, and Swift/T ran 10,000+ experiments on Summit; noise injection and counterfactuals revealed which genes drive tumor classification.
- 2026 Urban Microclimate at Scale: Array of Things, EnergyPlus, and CFD for Chicago Coupling Chicago's IoT sensor network, WRF mesoscale weather, EnergyPlus building simulation, and Nek5000 wall-resolved LES into a city-scale workflow.
- 2022 From RGG and MeshKit to the MOOSE Reactor Module Parallel CoreGen generated a 101M-element MONJU reactor mesh on 712 processors in under 7 minutes — a job the serial path couldn't run at all.
Recognition
- R&D 100, 2023 CANDLE — cancer AI infrastructure across Argonne, LLNL, and ORNL
- R&D 100, 2022 FLASH-X — multiphysics simulation engine
- IMR 2010 Best Paper — reactor core mesh generation with lattice hierarchy encoding
- ATPESC 2015 Scholar — Argonne training program on extreme-scale computing
Funding
- Active DOE SEATS — Software Ecosystem for Advancing Climate Tools and Services
- Active NSF Raijin — collaborative research in climate model analysis
- 2017–2023 DOE ECP CANDLE — core contributor
- 2009–2016 DOE NEAMS — principal investigator, MeshKit
Roles
- 2009–present Argonne National Laboratory — Principal Specialist in Research Software Engineering
- 2023–present University of Chicago — Staff At-Large, cancer pharmacogenomics and Earth system science
- 2007–2009 Arizona State University — Research and teaching assistant, structural and computational mechanics
Education
- 2020 M.S. Computer Science — University of Chicago
- 2009 M.S. Structural Engineering — Arizona State University
- 2006 B.Tech. Mechanical Engineering — IIT ISM Dhanbad