I develop and translate models to improve the reliability and interpretability of insights from limited, high-dimensional, noisy and imperfect data — with a focus on biological systems.
I am a PhD candidate in Bioinformatics and Genomics at Pennsylvania State University, where I am advised by two statisticans: Prof. Justin Silverman, MD, PhD and Prof. Nicole Lazar, PhD.
I am interested in identifying and solving broad problems at scale. I am motivated by the potential of technology to accelerate scientific discovery and improve human health. My work focuses on a major gap in the rigor of scientific methods for analyzing high-throughput sequencing data. I aim to help close that gap by studying the information content of data, evaluating limitations, and developing robust/practical analytical frameworks to solve these challenges.
Without rigor, flawed methodology can produce findings that do not replicate, waste resources, erode trust in science, and slow the pace of discovery.
I believe scientific modeling requires a deep understanding of both the underlying translational context and the statistical/computational principles involved. I possess a unique combination of the statistical depth, translational understanding, and computational skills essential for building methods that are not only theoretically sound, but also practical for real-world problems guided by a science/medical background.
[ Your preface — to be written ]
Click any entry in the table of contents
to jump directly to that chapter.
No one tells you that the transition takes longer on the inside than it does on paper. You can update your CV long before you update your sense of who you are.
This chapter is for anyone arriving at a quantitative field from somewhere else — from physiology, biology, social science, or any domain where the tools of inquiry are not grounded by mathematical rigor. I write from the vantage of a premed-to-bioinformatics-to-quantitative scientist transition, but the shape of the problem is the same across fields: you are not starting from zero, but you do not yet know what you have.
There is a liminal phase that most people underestimate. You can be genuinely capable at quantitative work and still introduce yourself, for years, with the old credential as a hedge — I'm a biologist, but I do some programming. The hedge is protective. It manages expectations and offers an exit route if the room turns skeptical. It is also a symptom.
The shift — from someone who codes to a statistician, an ML researcher, a computational scientist, whatever the destination label turns out to be — does not happen at a specific moment. It happens in aggregate, through the accumulation of problems you solved with quantitative tools and couldn't have solved any other way. At some point you stop reaching for the old vocabulary to explain yourself. That is the transition completing itself.
The credential hedge is understandable, and for a while it's honest. But at some point it stops describing your uncertainty and starts creating it. Notice when that happens.
The fear of mathematics is almost always disproportionate to what a transition actually requires. The first instinct is to treat it like a prerequisite course: master calculus, then linear algebra, then probability theory, then statistics, then you may begin. This is the wrong order and the wrong model entirely.
What most transitions into quantitative biology or ML require at the outset is narrower than you think: enough calculus to understand what a gradient is doing, enough linear algebra to reason about data as matrices and understand why decompositions matter, and enough probability to understand what a distribution is and why conditioning matters. You do not need proofs. You need intuition about what the operations represent.
The deeper mathematics arrives naturally, and only when a specific problem demands it. I did not properly understand eigendecomposition until I needed to think carefully about PCA on gene expression data. I did not understand the KL divergence until variational inference required it. The problem came first; the understanding followed. This is not a shortcut. It is how mathematical knowledge actually accumulates in practice.
Learn enough to begin. The gaps will announce themselves when they matter. A gap that has not yet blocked you is not your most urgent problem.
Many people transitioning from non-quantitative fields treat programming as the primary skill to acquire. This is understandable — it is the most visible difference between where they are and where they want to be. Syntax is learnable in weeks. Thinking computationally — knowing how to decompose a problem, when to approximate, what to trust in your output — takes years and does not come from syntax.
2There is no curriculum designed for the transition you are making, because the transition is different for everyone and the field moves faster than curricula can track. The roadmap does not exist. This is disorienting. It is also, eventually, freeing.
Self-teaching under these conditions is necessarily non-linear. You will pick up linear algebra from a video series when PCA stops making sense, probability from a textbook when a paper's methods section becomes opaque, and statistical modeling from the code of people whose results you are trying to reproduce. None of these will happen in a clean sequence. All of them will happen in response to a specific problem that mattered to you right now — and that is precisely why they will stick.
The failure mode to watch for is the preparation trap: convincing yourself that you must complete some prior body of knowledge before you are permitted to begin the actual work. There is always more prior knowledge. The people who make the transition successfully do not wait until they are ready. They begin, encounter what they don't know, and go learn exactly that thing.
Build the plane while flying it. The alternative — waiting on the ground until you understand aerodynamics — does not produce pilots. It produces people who read a lot about flight.
People trained in quantitative fields learn to optimize functions they do not always understand. People trained in biology and medicine learn what the function is supposed to represent. This asymmetry sounds abstract. Its consequences are concrete.
Domain knowledge is not decoration applied after the real work is done. It is a filter — often the only filter available — against results that are technically correct and scientifically meaningless. The ability to look at a model output and recognize that it is biologically implausible, even when the loss converged and the code ran clean, is a form of intelligence that cannot be derived from data alone.
3This shows up in subtler ways too. A physiologist who learns to build statistical models brings with it an understanding of measurement error — the knowledge that a blood pressure reading is not blood pressure, that a self-report is not a behavior, that a sequencing read is not a transcript. This understanding of the distance between the measurement and the thing measured is hard-earned in experimental science and largely invisible to people who learned statistics on clean benchmark datasets.
Coming from premed specifically: the clinical framing — who does this affect, what decision does this change, what would happen if you were wrong — is not something most quantitative training instills deliberately. Carry it forward. It becomes one of the rarer things you bring to a field that often optimizes metrics without asking what the metric is for.
There will be a period, probably longer than feels fair, when you feel like a visitor in the field. You sit in seminars and follow maybe half of what is said. You read papers and skip the methods. You produce results that you cannot fully defend yet. This is not impostor syndrome in the pejorative sense — it is an accurate read of your current state. The answer to it is not reassurance. It is time and repetition.
You stop being a guest when you begin to have opinions. When you read a methods section and have a reaction — that model choice seems odd given the data structure, that baseline comparison is misleading — rather than just receiving information. Opinions require context. Context requires immersion. There is no shortcut, but there is a direction: do the work, read widely, and be honest about what you don't understand yet.
The transition is complete not when you know everything, but when you know enough to know what you're missing — and can find it.
[ Next chapter — to be written ]
[ Next chapter — to be written ]
This is a living document.
Comments and corrections are welcome.
Letters to a Pre-Scientist
A national pen-pal program pairing professional scientists with middle school students, many from under-resourced communities. Through personal correspondence I help demystify what scientists actually do, and show students that science is a real career path for people who look like them.
prescientist.orgCCBB Workshop, 2024 & 2026
Co-organizer of the CCBB Workshop, bringing together researchers developing methods for sequence analysis. I ideated and co-led a panel discussion with professors and students on the gap between method developers and users — focusing on community needs, tool adoption, and barriers to use.
ccbb.psu.edu/wip2024
\
NYRR Volunteer Leadership
Selected for a competitive leadership development program with NYRR, the organization behind the NYC Marathon. Training in nonprofit leadership, community engagement, and event management — including organizing and supervising volunteers across races, including the NYC Marathon itself.
nyrr.org/volunteer-leadership-programCatholic Diocese of Ghana
Volunteered as a medical aide, learning firsthand about the barriers to care in a developing country. Assisted in basic healthcare services and health education. This experience deepened my understanding of global health challenges and the importance of accessible healthcare.
President of GenoMIX, Penn State
Led Penn State's graduate organization for genomics — organizing events, managing communications, mentoring peers. Developed 10 student resources including a graduate school guide, area assimilation handbook, career development guide, and mentorship program framework.
GenoMIX at Penn State
iSTEAM Microbiome Marvels
As part of a T32 training grant, I helped develop and deliver a curriculum on microbiome science for high school biology teachers. The curriculum included interactive lessons, hands-on activities, and real-world applications to help teachers introduce big ideas about the small world.





