Experience
Bioinformatic Data Scientist
- Consult across six concurrent research teams, scoping analytical approaches to experimental designs and delivering fully reproducible analyses and pipelines tailored to each team's unique research questions.
- Apply a breadth of methods including GLMs, survival analysis, regularized regression, random forests, and dimensionality reduction across large, complex datasets on on-premises HPC infrastructure (Slurm).
- Translate complex analytical findings into clear, actionable conclusions for collaborators with varied quantitative backgrounds, including written reports and presentations tailored to non-technical audiences.
- Develop interactive dashboards (R Shiny, Quarto) enabling others to explore and interrogate analytical results directly, communicating deep insights to non-technical stakeholders.
- Serve as a Nextflow Ambassador and nf-core Outreach Team member, contributing to a globally distributed open-source community spanning hundreds of institutions and research groups worldwide and leading internal training sessions to drive adoption of scalable workflow tooling.
Relevant coursework
Graduate Research Assistant
- Partnered with wet-lab scientists to define statistical frameworks for novel experimental designs, bridging domain biology and quantitative methodology
- Developed open-source tools addressing gaps identified through cross-group collaboration, including micRoclean and optima
Education
M.S. Health Data Science
Relevant coursework
B.S. Behavioral Neuroscience
Relevant coursework
Skills & Technology
Statistics & Mathematics
- Regression: Linear regression, logistic regression, generalized linear models
- Survival Analysis: Kaplan-Meier estimation, Cox proportional hazards, competing risks
- Statistical Learning: Supervised and unsupervised machine learning, regularization (LASSO, ridge), random forests, dimensionality reduction
- Categorical Analysis: Chi-square tests, contingency tables, log-linear models, ordinal regression
- Clinical & Observational Studies: Clinical trial design, propensity score methods, causal inference frameworks
- Genomic Statistics: Multiple testing correction (FDR, Bonferroni), differential expression analysis, genomic data methods
- Foundations: Calculus I & II, probability theory
Programming & Infrastructure
- Languages: R, Python, Bash, SQL
- Workflow Management: Nextflow, Slurm
- Version Control: Git, GitHub
- Containers and Environments: Docker, Conda, Apptainer/Singularity
Transcriptomics
- Single-cell RNA-seq: Preprocessing and QC (Cell Ranger), dimensionality reduction and differential expression (Seurat, scanpy), pathway enrichment (GO, Reactome, Qiagen IPA)
- Bulk RNA-seq: QC (FastQC, MultiQC), alignment (RSEM, Bowtie2), dimensionality reduction (PCA, UMAP), differential expression and pathway analysis (GO, Reactome, Qiagen IPA)
- Spatial Transcriptomics: Visium HD (10x Genomics) - Spatial gene expression analysis, tissue segmentation, spot-level and bin-level quantification, integration with single-cell reference atlases
Genomics
- Single-cell DNA-seq: Tapestri (MissionBio) - Preprocessing and QC (MissionBio Tapestri Insights), subclone identification via dimensionality reduction, integration with bulk DNA-seq for germline variant detection
Epigenomics
- Reduced representation bisulfite sequencing: QC (FastQC), read trimming, genome preparation and alignment (Bismark), differential methylation analysis (methylKit)
Metagenomics
- 16S rRNA-seq: QC (FastQC), adapter trimming (cutadapt), denoising and ASV generation (DADA2), taxonomic profiling (QIIME2), differential abundance testing (ANCOM-BC)
- Metagenomic shotgun sequencing: QC, read trimming (Trimmomatic), taxonomic profiling (MetaPhlAn), diversity analysis and visualization (phyloseq)
Presentations
Introduction to Nextflow and nf-core
micRoclean: an R package for decontaminating low biomass 16S-rRNA microbiome data
optima: an open-source R package for the Tapestri platform for integrative single cell multiomics data analysis
Professional Training
Multi Omics NETwork Analysis Workshop (MONET)
Nextflow Training Week
- Certificates: Hello Nextflow, Nextflow for Genomics, Hello nf-core, Nextflow for RNA-seq
Single Cell RNA-seq Workshop
Epic Cosmos Data Model Certification
- Certified for the Cosmos deidentified dataset
IBM Data Science Professional Certificate
SQL for Data Science with R
Memberships & Affiliations
Nextflow Ambassador
Member, American Statistical Association
- Section on Statistics in Genomics and Genetics
Board Member, Graduate Student Biostatistics Association
Publications
Published Nrf2 Promotes Regulatory T Cell Differentiation by Reprogramming Glutamine Metabolism and Alleviates Ulcerative Colitis
Science Immunology
Published Distinguishing the significance of blood microbes in epithelial ovarian cancer
Gut Microbes Reports, 2026
Preprint Integrative modeling of read depth and B-allele frequency improves single-cell copy number calling from targeted DNA sequencing panels
bioRxiv, March 2026
Published Extracellular Vesicle miRNAs as Biomarkers of Asthma Severity
Allergy, January 2026
Published Nrf2 drives activation-driven expansion of CD4+ T cells by modulating glucose and glutamine metabolism
Cell Reports, September 2025
Published Therapeutic potential of NRF2 activating drug RTA-408 in suppressing T cell effector responses and inflammatory bowel disease
The Journal of Immunology, August 2025
Published micRoclean: an R package for decontaminating low-biomass 16S-rRNA microbiome data
Published optima: an Open-source R Package for the Tapestri platform for Integrative single cell Multi-omics data Analysis
Bioinformatics, October 2023