From 50 Million Cells to AI-Driven Discovery: scPRINT and the Future of Gene Network Modeling Powered by AI
Jeremie Kalfon ’s new research is shaping the future of single-cell AI, one gene at a time.
The field of single-cell genomics is undergoing a quiet revolution. Datasets are exploding in size. Technologies are pushing the limits of resolution. But human capacity to interpret massive datasets, especially to understand how genes interact in individual cell types, is limited.
That’s what makes the recent publication of scPRINT in Nature Communications so riveting.
Similar to WhiteLab Genomics’ AI platform, ALFRED, scPRINT enables researchers to extract insights from massive single-cell datasets, revealing patterns otherwise hidden to conventional analysis.
We’re proud to say that scPRINT was led by Jeremie Kalfon , former Computational Biology Lead at WhiteLab Genomics, who transitioned from WhiteLab to pursue his joint PhD in Applied Mathematics and Artificial Intelligence at Institut Pasteur , co-supervised by Laura Cantini of Institut Pasteur and Gabriel Peyré of Ecole normale supérieure . Indubitably, this publication marks a milestone for him and for the field of genomic medicine.
🧬 What is scPRINT?
scPRINT is a foundation AI model based on bidirectional transformer architecture, trained on over 50 million single cells from diverse tissues, diseases, and species, designed to do something ambitious: reliably predict gene networks, denoise complex data, and annotate new datasets, without retraining.
In short, it’s a general-purpose AI for biology. Think GPT, but for cellular gene expression.
Instead of working with text, scPRINT learns from the “language” of cells. It identifies patterns in how genes are activated across tissues, across species, and under different biological conditions.
⚙️ Technical Highlights
The model uses a bidirectional transformer architecture, pre-trained on self-supervised tasks, such as recovering masked gene expression values (denoising) and aligning cell embeddings across tissues and datasets.
That means it’s not fine-tuned to just one dataset or tissue. Instead, it builds a robust, reusable internal understanding of gene behavior in cells.
The result?
- Gene network inference that rivals specialist tools
- Cell type classification with zero-shot generalizability
- Batch correction and denoising on-the-fly
All of this was achieved using a single GPU for 48 hours to train their medium model. This level of accessibility puts deep learning in reach for many academic labs and emerging biotech players.
🔍 Real-World Impact
As a demonstration, scPRINT was applied to benign prostatic hyperplasia (BPH), a common condition involving the non-cancerous enlargement of the prostate.
With no disease-specific fine-tuning, the model flagged gene networks tied to:
- Chronic inflammation
- Cellular senescence
- Extracellular matrix remodeling
It even identified PAGE4, a hub gene, which may be central to the interplay between senescence (aging), inflammation, and extracellular matrix changes.
Recommended by LinkedIn
That’s the promise of scPRINT: enabling AI-powered hypothesis generation at the systems biology level.
💡 Why This Matters
You likely already know that building gene networks typically means:
- Long timelines
- Expensive wet-lab experiments
- Dataset-specific pipelines
scPRINT offers a faster, more scalable alternative that doesn’t need any fine-tuning. Its zero-shot capabilities could dramatically cut down the time and cost needed to explore:
- New disease models
- Drug targets
- Biomarker signatures
- Tissue-specific biology
This aligns perfectly with what we believe at WhiteLab Genomics: the future of therapeutics is driven by integrated data, smart algorithms, and scalable platforms, made possible through engineering and algorithmic optimization on massive, high-quality datasets.
🎓 Jéremie’s Impact at WhiteLab
Jérémie was instrumental in advancing the computational biology module of our AI platform, ALFRED, during his time at WhiteLab Genomics. While we miss his day-to-day insights, we’re excited to see him contributing to the broader scientific community at this level.
His paper reflects scientific rigor and the kind of interdisciplinary thinking that we strive for.
🚀 What’s Next?
As the field evolves, models like scPRINT will be central to generating and testing new hypotheses. We expect these tools will play an increasingly important role in developing more precise, efficient, and scalable genomic medicines.
If you're curious about AI platforms for single-cell data, genomic medicine, or early discovery, we invite you to connect with WhiteLab to learn more about how AI-driven insights are shaping next-generation therapeutics.
📚 Read the full paper in Nature Communications
🧬 Check out the scPRINT GitHub
📦 Explore the scPRINT deposit on Zenodo
Congratulations to Jérémie and the co-authors for this outstanding work!