Research & Data Science

Four peer-reviewed publications in the Monthly Notices of the Royal Astronomical Society (MNRAS), each demonstrating end-to-end data science, advanced statistical modeling, and the ability to communicate complex findings to diverse technical audiences.

Research Context

My doctoral and postdoctoral research (2014–2022) focused on the large-scale structure of the universe, using cosmological simulations and observational data to study dark matter halo dynamics. This work sits at the intersection of high-performance computing (HPC), big data (petabyte-scale simulations), and advanced statistical modeling.

Phase space density visualization showing radial velocity distributions for three mass bins
Phase-space density estimation across three halo mass bins
MCMC model fits with uncertainty bands for mass-binned haloes
MCMC parameter estimation with Bayesian uncertainty bands
6x6 Gaussian Process Regression matrix mapping parameter interactions
Gaussian Process Regression across 2D parameter space
Convergence map with signal-to-noise contours and cluster detection
Signal-to-noise contour mapping on convergence fields

Technical Methods & Tools

  • Programming: Python (7+ years), including NumPy, SciPy, and custom analysis pipelines
  • Statistical methods: MCMC sampling, Gaussian Process Regression, Bayesian parameter estimation with full covariance matrices, signal-to-noise optimization
  • Machine learning: Model selection, density estimation, parameterized profile fitting (NFW vs. DK models), feature engineering from high-dimensional phase-space data
  • Data visualization: Matplotlib and Seaborn for publication-quality figures, custom multi-panel layouts, contour mapping, and phase-space density visualizations
  • Technical writing: LaTeX for all publications, structured scientific writing for international peer review
  • Data scale: 8+ billion particles from N-body cosmological simulations, cross-matched with multi-survey observational catalogs
  • Computing: HPC cluster environments for simulation processing and ray-tracing pipelines

Cross-Domain Skills

This research required end-to-end problem solving: designing data pipelines from raw simulation output to analysis-ready catalogs, statistical inference at scale, model selection and validation, and petabyte-scale data processing on HPC systems. It also required translating complex mathematical concepts into structured narratives, creating effective data visualizations, and communicating methodology clearly for international peer review—skills that transfer directly to industry roles in data science, AI-powered tool development, and technical communication.

Research Papers

Skills Demonstrated

Python LaTeX Matplotlib MCMC Sampling Bayesian Inference Gaussian Process Regression N-body Simulations Weak Gravitational Lensing Data Visualization Scientific Writing HPC Computing Pipeline Design Statistical Modeling Peer Review