In February 2026, Eli Lilly activated what is now the most powerful privately owned supercomputer dedicated to pharmaceutical research. Called LillyPod, the system packs 1,016 NVIDIA Blackwell GPUs in a DGX SuperPOD configuration, delivers a sustained 9,000 petaflops of computational throughput, and sits at the center of a 700-terabyte genomics data lake that connects Lilly's century of biological research to the frontier of AI-driven drug discovery. The system cost an estimated $500 million to build and is backed by a $1 billion co-innovation agreement with NVIDIA to develop custom molecular dynamics simulation pipelines over the next five years.
LillyPod is not an experiment. It is an operational system processing molecular simulations around the clock, screening compounds against protein targets at a rate that would have been physically impossible two years ago. In its first month of operation, LillyPod screened 2.3 billion molecular candidates against a novel oncology target — a task that would have taken Lilly's previous computational infrastructure approximately 14 months. LillyPod completed it in 11 days.
The implications extend far beyond Eli Lilly. LillyPod represents the maturation of a paradigm shift that has been building across the pharmaceutical industry for a decade: the transition from "wet lab first, compute second" to a model where computational screening is the primary discovery method and physical experiments are reserved for validation of computationally identified candidates.
Inside LillyPod: Architecture and Capabilities
Hardware Configuration
LillyPod's architecture is built on NVIDIA's DGX SuperPOD reference design, scaled to a configuration that places it among the top 15 most powerful supercomputers in the world:
- 1,016 NVIDIA Blackwell B200 GPUs arranged in 127 DGX B200 nodes, each containing 8 GPUs with 192GB HBM3e memory per GPU
- Total GPU memory: approximately 195 terabytes of high-bandwidth memory, enabling massive molecular models to remain resident in GPU memory without costly data transfers
- Interconnect: NVIDIA NVLink and NVSwitch within nodes, InfiniBand NDR400 between nodes, providing 3.6 terabits per second of bisection bandwidth
- Sustained performance: 9,000 petaflops of mixed-precision compute (FP8/FP16), with peak theoretical performance exceeding 12,000 petaflops
- Storage: 700TB genomics data lake on a parallel file system with 48 TB/s aggregate throughput, enabling real-time access to Lilly's complete historical dataset during simulations
The Genomics Data Lake
The hardware is formidable, but the data is what makes LillyPod unique. Lilly has spent over 18 months curating a unified data lake that integrates:
- Complete genomic sequences for over 4.2 million patients from clinical trials spanning four decades, with consent-based data sharing agreements covering research use
- Molecular activity data from 1.8 billion experimental measurements across Lilly's historical screening programs
- Published literature embeddings — vector representations of 3.7 million peer-reviewed publications relevant to Lilly's therapeutic areas, enabling AI models to ground their predictions in established science
- Protein structure databases incorporating AlphaFold predictions, experimentally determined structures from the Protein Data Bank, and Lilly's proprietary cryo-EM datasets
- Real-world evidence from electronic health records, claims databases, and registry data covering patient outcomes across Lilly's marketed drugs
This data lake enables LillyPod's AI models to move beyond narrow structure-activity relationship predictions and instead reason about the full biological context of a drug candidate — from molecular binding dynamics through metabolic pathways to patient outcomes.
Custom Simulation Pipeline
The NVIDIA-Lilly co-innovation agreement funds the development of BioNeMo-Lilly, a custom molecular dynamics simulation pipeline built on NVIDIA's BioNeMo framework. The pipeline integrates three computational stages that traditionally operated as disconnected workflows:
Stage 1 — Virtual Screening: AI models screen billions of molecular candidates against a protein target, predicting binding affinity with accuracy that now approaches experimental-grade precision (R-squared > 0.82 against crystallographic binding data). This stage leverages generative chemistry models that do not merely score existing compounds but propose novel molecular structures optimized for the target.
Stage 2 — ADMET Prediction: Candidates that pass virtual screening are evaluated for absorption, distribution, metabolism, excretion, and toxicity properties using multi-task neural networks trained on Lilly's historical ADMET data. This stage eliminates approximately 85% of candidates that would have survived traditional screening but would fail in preclinical studies due to poor pharmacokinetic properties.
Stage 3 — Molecular Dynamics Simulation: The top candidates undergo full molecular dynamics simulations — GPU-accelerated physics-based models that simulate the actual motion of every atom in the drug-target complex over microsecond timescales. These simulations reveal binding stability, off-target interactions, and conformational dynamics that cannot be predicted by AI models alone. LillyPod can run thousands of these simulations concurrently, a capability that was previously available only to national laboratories.
The Wet-to-Dry Lab Paradigm Shift
How Drug Discovery Used to Work
The traditional drug discovery process followed a linear, experiment-heavy pipeline:
- Target identification (2-3 years): Biological research to identify a protein or pathway implicated in disease
- Hit identification (1-2 years): Physical screening of compound libraries — sometimes millions of physical compounds — to find molecules that interact with the target
- Lead optimization (2-3 years): Iterative synthesis and testing of compound variants to improve potency, selectivity, and drug-like properties
- Preclinical development (1-2 years): Animal studies, formulation development, and safety pharmacology
- Clinical trials (4-7 years): Phase I (safety), Phase II (efficacy), Phase III (confirmatory), each requiring hundreds to thousands of patients
Total timeline: 12-15 years. Total cost: $2.6 billion on average, including the cost of failed programs (which represent approximately 90% of all drug development attempts).
How AI Changes the Equation
The AI-driven approach compresses the first three stages dramatically by replacing physical experiments with computational screening and optimization:
- Target identification is accelerated by AI models that analyze genomic data, literature, and clinical records to identify novel targets, reducing the timeline to 6-12 months
- Hit identification moves entirely to computational screening, where billions of virtual compounds can be evaluated in days rather than years — reducing the timeline to weeks
- Lead optimization is transformed by generative chemistry models that propose optimized molecular structures based on multi-objective optimization (potency, selectivity, ADMET properties, synthetic accessibility) — reducing the timeline to 3-6 months
- Preclinical development benefits from AI-predicted toxicity profiles that reduce the failure rate and accelerate the design of animal studies, saving 6-12 months
The clinical trial stages remain largely unchanged in duration (regulatory requirements mandate minimum observation periods), but AI optimizes patient selection, trial design, and endpoint analysis, reducing costs and increasing the probability of success.
Net effect: The total drug development timeline could compress from 12-15 years to 5-7 years, and the average cost from $2.6 billion to under $1 billion — primarily by eliminating the failures that consume the majority of traditional R&D spending.
The Broader Pharma AI Landscape
Recursion Pharmaceuticals
Recursion has built what it calls the "Recursion OS" — an automated drug discovery platform that generates and analyzes millions of biological experiment images per week using high-throughput microscopy. In March 2026, Recursion has four AI-discovered drug candidates in clinical trials:
- REC-4881 (Phase II): A MEK inhibitor for familial adenomatous polyposis, identified through phenotypic screening of Recursion's cellular image database
- REC-994 (Phase II): A cerebral cavernous malformation treatment, repurposed from an existing compound library using AI-guided target deconvolution
- REC-3964 (Phase I): A novel antiviral candidate discovered through AI analysis of host-pathogen interaction data
- REC-617 (Phase I): An immunology compound identified through multi-omic pathway analysis
Recursion's approach differs from Lilly's in an important respect: while LillyPod emphasizes physics-based molecular simulation, Recursion relies primarily on phenotypic screening — using AI to analyze how compounds change the physical appearance and behavior of cells, without requiring upfront knowledge of the molecular target. Both approaches have demonstrated clinical-stage results.
Insilico Medicine
Insilico Medicine achieved a landmark in AI-driven drug discovery when its lead candidate, ISM001-055 (now INS018-055), became the first AI-discovered drug to complete a Phase II clinical trial in 2025. The compound, targeting idiopathic pulmonary fibrosis, was discovered, designed, and optimized entirely by Insilico's AI platform in under 18 months — compared to the 4-5 years typically required for the same stages.
Insilico's pipeline now includes 11 AI-discovered candidates across oncology, fibrosis, immunology, and central nervous system diseases, with total development costs approximately 70% lower than industry averages for comparable programs.
Isomorphic Labs (DeepMind Spinoff)
Isomorphic Labs, founded by DeepMind CEO Demis Hassabi in 2021, has leveraged AlphaFold's protein structure prediction capabilities to build a drug discovery platform focused on structure-based drug design. The company has:
- Signed partnerships worth a combined $3.3 billion with Eli Lilly and Novartis
- Developed AlphaFold 3, which predicts not just protein structures but the structures of protein-drug complexes — the precise interaction geometry that determines whether a drug candidate will bind effectively to its target
- Demonstrated that AlphaFold-predicted binding poses match experimental X-ray crystallography results with sub-angstrom accuracy for over 70% of tested complexes
Isomorphic's approach is complementary to LillyPod's: AlphaFold provides the structural predictions that feed into the molecular dynamics simulations that LillyPod runs at scale.
Technical Deep Dive: How AI Accelerates Each Stage
Virtual Screening at Scale
Traditional high-throughput screening physically tests compounds against targets — a process limited by the size of a company's physical compound library (typically 1-5 million compounds). Virtual screening removes this constraint entirely.
Modern AI-powered virtual screening uses graph neural networks and 3D convolutional neural networks to predict binding affinity between a candidate molecule and a target protein. The molecule is represented as a graph (atoms as nodes, bonds as edges) or as a 3D electron density cloud, and the model predicts the free energy of binding — the thermodynamic quantity that determines whether the molecule will stick to the target.
LillyPod's screening pipeline evaluates candidates at a rate of approximately 200 million compounds per day, searching a virtual chemical space of 10^33 possible drug-like molecules — a number that exceeds the estimated number of molecules in the observable universe. The AI does not enumerate this space; instead, it uses generative models to navigate it intelligently, proposing novel structures that are predicted to have high binding affinity while satisfying drug-like property constraints.
ADMET Prediction
ADMET prediction — forecasting how a drug candidate will be absorbed, distributed, metabolized, excreted, and whether it will cause toxicity — has historically been one of the highest-failure-rate stages of drug development. Approximately 60% of drug candidates that show promising activity against a target fail in preclinical or early clinical stages due to poor ADMET properties.
AI-based ADMET prediction uses multi-task learning — training a single neural network to simultaneously predict dozens of ADMET endpoints (solubility, permeability, CYP450 metabolism, hERG channel inhibition, hepatotoxicity, and more). This approach works because ADMET properties are biologically correlated: a molecule's solubility provides information about its permeability, and its metabolic stability relates to its toxicity profile. Multi-task models exploit these correlations to achieve prediction accuracy that exceeds single-task models by 15-25%.
LillyPod's ADMET models are trained on Lilly's proprietary dataset of 1.8 billion experimental measurements — an order of magnitude larger than any publicly available ADMET dataset — giving them a significant accuracy advantage over models trained on public data alone.
Protein Structure Prediction
The 2020 AlphaFold breakthrough — predicting protein structures with experimental accuracy — has been refined and extended through successive iterations:
- AlphaFold 2 (2020): Predicted single-protein structures with experimental accuracy
- AlphaFold 3 (2024): Predicts protein-ligand complexes, protein-protein interactions, and protein-nucleic acid complexes
- BioNeMo-Lilly structural models (2026): Custom models trained on Lilly's proprietary cryo-EM datasets, achieving accuracy improvements of 12-18% on Lilly's specific therapeutic targets compared to general-purpose AlphaFold predictions
Accurate protein structure prediction is foundational to structure-based drug design because the shape of the target's binding site determines which molecules can bind effectively. With LillyPod, Lilly can predict structures, design candidate molecules, simulate their binding dynamics, and evaluate their ADMET properties in a single integrated computational pipeline — a workflow that previously required separate teams, tools, and timelines.
Clinical Trial Optimization
AI does not yet replace clinical trials, but it significantly optimizes their design and execution:
- Patient selection: AI models analyze electronic health records and genomic data to identify patients most likely to respond to the drug, reducing the number of patients needed to demonstrate efficacy by 30-40%
- Adaptive trial design: AI-guided Bayesian optimization adjusts dosing, endpoints, and patient allocation during the trial based on interim results, reducing trial duration by 20-30%
- Digital twin modeling: AI creates computational models of individual patients that predict treatment response, enabling virtual testing of dosing regimens before physical administration
- Endpoint prediction: AI models predict which clinical endpoints are most likely to demonstrate a statistically significant treatment effect, reducing the risk of costly trial failures
The Economics of AI-Driven Drug Discovery
The financial case for AI-driven drug discovery rests on two levers: reducing the cost of success and reducing the cost of failure.
Reducing the cost of success: AI compresses the discovery and preclinical stages from 5-8 years to 1-2 years, eliminating years of laboratory costs, personnel costs, and facility costs. For a typical drug program, this represents savings of $300-500 million per successful drug.
Reducing the cost of failure: The traditional drug development process has a 90% failure rate — meaning that for every successful drug, nine programs fail, each consuming hundreds of millions of dollars. AI's ability to predict failures earlier (through better ADMET prediction, toxicity modeling, and efficacy forecasting) means that failing programs are terminated in the computational stage rather than in expensive clinical trials. If AI can increase the overall success rate from 10% to 20% (as several analyses suggest is achievable), the cost savings from avoided failures exceed $1 billion per successful drug.
Combined impact: The pharmaceutical industry spent $244 billion on R&D in 2025, producing approximately 55 new approved drugs. If AI-driven approaches can cut the per-drug cost from $2.6 billion to under $1 billion while increasing the success rate, the industry could either produce the same number of drugs for $100 billion less or produce 2-3x more drugs for the same investment — either outcome representing a transformative improvement in global health.
Enterprise Implications Beyond Pharma
The infrastructure patterns pioneered by LillyPod are not unique to pharmaceutical research. Any industry with complex simulation workloads — where the cost and time of physical experimentation are the primary bottleneck — can benefit from similar AI-driven computational approaches:
- Materials science: AI-guided simulation of novel materials for batteries, semiconductors, and structural applications, replacing expensive and time-consuming physical synthesis and testing
- Chemical engineering: Computational optimization of chemical processes, catalysts, and reaction conditions, reducing the need for pilot plant experiments
- Aerospace: AI-accelerated computational fluid dynamics, structural analysis, and materials qualification, compressing design cycles from years to months
- Energy: Simulation of reservoir dynamics, grid optimization, and renewable energy system design at scales that enable better capital allocation decisions
- Financial services: Monte Carlo simulations for risk modeling, portfolio optimization, and stress testing at scales that capture tail risks more accurately
The common thread across these industries is that computational power has become a strategic asset, and the organizations that invest in AI-optimized compute infrastructure gain a structural advantage over competitors who rely on physical experimentation.
For organizations that need dedicated, high-performance AI infrastructure without building and operating their own data center, Swfte's Dedicated Cloud provides isolated compute environments optimized for large-scale AI workloads — the same class of GPU-accelerated infrastructure that powers systems like LillyPod, available as a managed service with enterprise-grade security, compliance, and operational support.
What LillyPod Means for the Future
LillyPod is a milestone, not an endpoint. The convergence of GPU-accelerated compute, AI models trained on decades of biological data, and physics-based simulation is creating a new foundation for drug discovery that will reshape the pharmaceutical industry over the next decade.
The key trends to watch:
- Cost compression: As GPU performance continues to improve and AI models become more efficient, the computational cost of drug discovery will drop by an estimated 10x over the next five years, making AI-driven discovery accessible to mid-sized pharmaceutical companies and academic research institutions
- Biological complexity: Current AI models excel at predicting binding affinity and ADMET properties for small-molecule drugs. The next frontier is extending these capabilities to biologics (antibodies, cell therapies, gene therapies), which represent a larger and faster-growing segment of the pharmaceutical market
- Personalized medicine: As genomic datasets grow and AI models improve, the ability to design drugs optimized for specific patient populations — or even individual patients — will move from theoretical possibility to clinical practice
- Regulatory adaptation: Regulatory agencies (FDA, EMA) are developing frameworks for evaluating AI-discovered drugs, including new guidance on computational evidence as a supplement to traditional clinical data
The pharmaceutical industry's embrace of AI supercomputing is the clearest signal yet that the drug discovery process is undergoing a fundamental transformation. Organizations across industries should take note: the principles demonstrated by LillyPod — massive computational investment, proprietary data curation, AI-integrated simulation pipelines — represent a playbook for competitive advantage in any domain where discovery and optimization are the primary value drivers.