Rajeev Jain

Rajeev Jain

Rajeev Jain

Principal Research Software Engineer · Argonne National Laboratory · University of Chicago

I build and scale the software behind scientific breakthroughs — GPU-accelerated deep learning frameworks, million-line HPC simulation engines, and climate analysis libraries used worldwide. 16 years leading development on large, multi-institutional codebases. Two R&D 100 Awards, 22+ publications, and EB-1A recipient for extraordinary ability in sciences.

Projects

Large-scale open-source tools I lead or contribute to. Each built to serve research communities, handling codebases of 100K+ lines across multi-institutional teams.

UXarray

Python library for unstructured climate grid analysis. Climate scientists working with next-generation grids (MPAS, ICON, SAM) lacked tools for conservative analysis that preserve integral quantities across non-uniform meshes — a gap blocking petabyte-scale research.

I lead development since inception: core mathematical operators including conservative zonal averaging via Gauss-Legendre quadrature, Grid I/O for multiple formats (ESMF, MPAS, SCRIP, HEALPix), full CI pipeline, and regular PyPI releases. Currently building an MCP server and AI agent for natural-language interaction with climate datasets. Adopted by NCAR, DOE labs, and universities worldwide.

Pangu-Weather on Aurora

PyTorch-based reimplementation of the Pangu-Weather deep learning framework for climate modeling, using the Spectral Fourier Neural Operator (SFNO). Ported and optimized for DOE's Aurora supercomputer — the first U.S. exascale system — with 60,000+ Intel GPUs. Demonstrates feasibility of AI-driven weather prediction beyond NVIDIA ecosystems, advancing DOE's mission for exascale Earth science.

CANDLE / IMPROVE

Hyperparameter optimization for cancer drug response prediction at supercomputer scale. Built the HPO infrastructure and ran 10,000+ training experiments across Summit, Theta, and Cori. Developed GitHub Actions workflows for cross-study validation. The 15+ researcher multi-lab collaboration (Argonne, LLNL, ORNL) relied on my benchmarking framework. Published in Briefings in Bioinformatics (2025). Additional papers: CANDLE/Supervisor, Counterfactuals.

FLASH-X

I/O optimization for a million-line exascale multiphysics simulation engine used by hundreds of researchers for astrophysics, combustion, and fluid dynamics. Checkpoint/restart was consuming 30–50% of runtime on leadership-class supercomputers. I implemented asynchronous HDF5 I/O with Argobots and integrated SZ3/ZFP compression — achieving 40–70% reduction in checkpoint times on Summit and 50%+ storage savings. Enabled cross-checkpoint restart between AMReX and Paramesh solvers (a first for FLASH). Published at SC24 DRBSD-10.

MeshKit

Open-source C++ toolkit for automated nuclear reactor core mesh generation. As PI, led design and development of lattice hierarchy-based meshing, parallel generation capabilities, and multi-format I/O. Adopted by reactor simulation teams at Argonne. Won Best Paper Award at the International Meshing Roundtable (2010).

Technical Expertise

Languages: Python, C++, Fortran, R, Bash, SQL

ML & Data: PyTorch, TensorFlow, NumPy, Pandas, Xarray, Scikit-learn, Parsl, Swift/T

HPC & Systems: MPI, OpenMP, HDF5, NetCDF, MOAB, Docker, Singularity, GitHub Actions

Domains: Climate modeling, cancer pharmacogenomics, computational physics, mesh generation, AI/ML infrastructure, reproducible workflows

Selected Publications

22+ publications · Full list on Google Scholar

Benchmarking community drug response prediction models

Partin, A., ..., Jain, R., et al. · Briefings in Bioinformatics, 27(1), 2025

Enabling Data Reduction for Flash-X Simulations

Jain, R., Tang, H., Dhruv, A., Byna, S. · DRBSD-10 Workshop, SC24

Cross-HPO: Optimizing Neural Networks for Cancer Drug Response

Jain, R., Wozniak, J.M., Partin, A., et al. · CAFCW24 Workshop, SC24

CANDLE/Supervisor: A workflow framework for machine learning applied to cancer research

Wozniak, J.M., ..., Jain, R., et al. · BMC Bioinformatics, 19(S18), 2018

Creating Geometry and Mesh Models for Nuclear Reactor Core Geometries

Tautges, T.J., Jain, R. · Journal of Engineering with Computers, 2011

Selected Presentations

  • SC24 — Tutorial: UXarray for Analysis of Unstructured Climate Data
  • SC24 — DRBSD-10: Enabling Data Reduction for Flash-X Simulations
  • SC24 — CAFCW24: Cross-HPO for Cancer Drug Response
  • AMS 2024 — UXarray: Extending Xarray with Support for Unstructured Grids
  • SciPy 2023 — UXarray for Unstructured Climate Data
  • HDF User Group 2023 — Data Reduction for FLASH-X Simulations

Recognition

EB-1A Extraordinary Ability

U.S. permanent residency granted under the EB-1A classification for extraordinary ability in sciences — reserved for individuals with sustained national or international acclaim.

R&D 100 Award, 2023

CANDLE — Cancer Distributed Learning Environment for drug response prediction. The "Oscars of Innovation."

R&D 100 Award, 2022

FLASH-X — Multiphysics simulation software for exascale computing.

ATPESC Scholar, 2015

Argonne Training Program on Extreme-Scale Computing — competitive program for HPC researchers.

Best Paper Award, 2010

International Meshing Roundtable — reactor core mesh generation.

Graduate Fellowship, 2007

Arizona State University — research assistantship in structural and computational mechanics.

Research Funding

  • DOE SEATS Active — Software Ecosystem for Advancing Climate Tools and Services
  • NSF Raijin Active — Collaborative Research in Climate Model Analysis
  • DOE ECP CANDLE — Core contributor (2017–2023)
  • DOE NEAMS — Principal Investigator for MeshKit (2009–2016)

Service & Mentorship

  • SBIR/STTR Proposal Reviewer — U.S. Department of Energy
  • Panelist — 5th Infraday Midwest Event ("Revolutionizing Public Infrastructure with AI")
  • Reviewer — Journal of Open Research Software, NumGrid
  • Committee — NumGrid 2020 Program Committee Member

Mentored several students and doctoral candidates over the years on research software engineering, HPC techniques, and open-source development practices.

Background

Argonne National Laboratory

Principal Specialist, Research Software Engineering. Lead developer for UXarray, FLASH-X, CANDLE/IMPROVE, MeshKit, and urban simulation projects. Division of Mathematics and Computer Science.

University of Chicago

Staff At-Large. Joint appointment supporting cancer pharmacogenomics and earth science research.

Arizona State University

Research & Teaching Assistant. Structural and Computational Mechanics Lab. Research on blast mitigation via FEM-based design optimization.

Education

M.S. Computer Science — University of Chicago (2020)
M.S. Structural Engineering — Arizona State University (2009)
B.Tech Mechanical Engineering — IIT ISM Dhanbad (2006)

Contact

Open to collaborations in scientific computing, AI for health, climate modeling, and open-source software.

rajeeja@gmail.com