The age signal in autism: reading genetics and development without erasing context
The value and the gap
Zhang, X., Grove, J., Gu, Y. et al. Polygenic and developmental profiles of autism differ by age at diagnosis. Nature (2025). https://doi.org/10.1038/s41586-025-09542-6. https://www.nature.com/articles/s41586-025-09542-6
Nature just published “Polygenic and developmental profiles of autism differ by age at diagnosis,” from a global team of researchers, Xinhe Zhang, Varun Warrier, Anders D. Børglum, Hilary C. Martin.
The study asked a question, Does the timing of an autism diagnosis carry information about developmental trajectories and genetics, or is it merely a function of who gets seen and when.
The authors assembled multiple kinds of evidence to test two models. The unitary model says autism has one basic polygenic architecture, so later diagnosis reflects subtler features, access, or context. The developmental model says there are partly distinct polygenic influences that track with earlier versus later diagnosis and that these map onto different developmental trajectories. They take a layered approach.
First, they examine longitudinal behavior from birth cohorts using repeated Strengths and Difficulties Questionnaire measures to see whether latent trajectories relate to age at diagnosis.
Second, they estimate how much common genetic variation explains variation in diagnosis age in large genetic cohorts. Third, they model the genetic covariance across several autism GWAS datasets that differ in median diagnosis age, to ask whether the signal decomposes into more than one latent factor.
Finally, they look at how any such factors correlate with ADHD, other mental health traits, and early developmental milestones.
Study participants. There are two participant families here.
The developmental trajectory work uses birth cohorts that follow children over time: the UK Millennium Cohort Study and two waves of the Longitudinal Study of Australian Children. In these, autistic participants had age at diagnosis recorded between 5 and 17 years, with sample sizes per cohort ranging from 89 to 188 for the primary analyses, and with sensitivity analyses using imputation and inclusion of ADHD yielding larger N in the UK cohort. Socioemotional difficulties were indexed with SDQ total and subscales, which are stable across sex and age, and moderately related to autism measures. The exact age at diagnosis is approximated by the age when caregivers first reported a diagnosis in these cohort datasets.
The genetic work uses very large autism cohorts with measured age at diagnosis and genotype data. These include iPSYCH from Denmark with 18,965 autistic individuals, and SPARK from the United States with 28,165 autistic individuals split into discovery and replication subsets. The cohorts differ in diagnostic systems and in median age at diagnosis. iPSYCH’s median is about 10 years, while SPARK’s is about 4 years. The paper also brings in additional autism GWAS datasets that either do not stratify by age at diagnosis or do, and that range from family based US samples with early childhood diagnosis medians to FinnGen, a population sample with a much later median age at diagnosis (22 years).
Study Design. In the birth cohorts, the team fit growth mixture models to SDQ trajectories without prespecifying groups and repeatedly found a 2-trajectory solution.
Trajectory 1: showed early childhood difficulties that remain high or attenuate only modestly.
Trajectory 2: starts lower and rises through late childhood and adolescence.
Membership in these latent trajectories differs in its association with timing of autism diagnosis. Children on the early difficulty trajectory are more likely to be diagnosed in childhood in two of the three cohorts where age was recorded earlier. Importantly, the sex ratio in these latent trajectory groups is similar within each cohort, which argues against a trivial sex composition explanation.
Results. Genetic analyses treat age at autism diagnosis as a quantitative trait. In iPSYCH, SPARK discovery, SPARK replication, and their meta analysis, the SNP based heritability is about 11%. This is on par with or larger than many measured sociodemographic and clinical variables, each explaining less than about 15% of variance, and the heritability does not meaningfully attenuate when those variables are added as covariates. That pattern fits better with the developmental model than the unitary model.
The authors then ask whether autism’s polygenic signal differs across datasets that vary in age at diagnosis. They estimate genetic correlations among thirteen autism GWASs, some stratified by age. They observe a gradient. Datasets with similar median diagnosis ages are more strongly genetically correlated with each other, and correlation drops as median ages diverge. A structural equation model built on six minimally overlapping GWASs finds that a 2-factor solution best fits the genetic covariance, with one factor loading on cohorts dominated by early childhood diagnosis and the second factor loading on cohorts with many adolescent and adult diagnoses. The two factors are only modestly genetically correlated at around 0.38. This is not principal component analysis but confirmatory modeling of genetic covariance, which is conceptually similar in that it reduces dimensionality, but differs in assumptions and in what the loadings mean.
What does it mean that there are two factors and not one. It means that across studies, the pattern of common variant effects that distinguish autistic cases from controls can be decomposed into two partially separable polygenic configurations. One configuration is associated with earlier diagnosis and, in longitudinal cohorts, with more evident difficulties sooner. The other is associated with later diagnosis and with a rise in socioemotional and behavioral difficulties during adolescence, and it shows stronger positive genetic correlations with ADHD and several mental health phenotypes. This does not mean there are only two “kinds” of autism. It means that a considerable slice of the case control genetic architecture covaries with age at diagnosis in a way that is parsimoniously captured by two correlated latent factors.
What is the variability and what does more than 40% variability mean. The paper does not center on a PCA scree plot with variance explained percentages for diagnosis age. Where the authors quantify explanatory power, they report that SDQ latent trajectory membership accounts for 12% to 30% of variance in diagnosis age across cohorts, and up to 57% in an imputed sensitivity analysis of the UK cohort, while sociodemographics explain about 3 to 6%. For heritability, common SNPs account for about 11%. Those are all proportions of variance in age at diagnosis. If you are thinking of a PCA statement such as two components explain more than 40% of variance, that framing does not appear as such in this paper. Instead, the 2-factor result comes from structural equation modeling of genetic correlations, with model fit indices and factor loadings rather than a simple variance explained bar.
What about rare variants. In SPARK trios, the authors test whether de novo or inherited protein truncating or missense variants in highly constrained genes track with age at diagnosis and find no association. This null could reflect limited power or clinical processes such as diagnostic overshadowing by intellectual disability or developmental delay in some carriers of de novo variants. In other words, in this dataset, common variation tracks with age at diagnosis, while the class of rare variants tested does not.
Age stratification and medians. The GWAS catalogs include several strata by diagnosis age. SPARK before 6 years has a median age at diagnosis near 3 years. SPARK after 10 years has a median near 16 years. iPSYCH before 9 has a median near 5.7 years, while iPSYCH after 10 has a median near 14.6 years. FinnGen, a population sample, has a median near 22.7 years. These medians are crucial context, because the early factor is defined by cohorts like SPARK before 6 and PGC cases recruited as trios with early onset documentation, while the later factor is defined by cohorts such as iPSYCH after 10, SPARK after 10, and FinnGen.
Age of the oldest participant’s at diagnosis. The paper reports medians and dispersion measures for several cohorts, and displays broad adolescent and adult ranges, but it does not, in the sections available here, provide a single maximum age at diagnosis across all cohorts. FinnGen shows the latest median and therefore almost certainly includes middle aged and older adult diagnoses. Because factors are derived from GWASs, the most faithful way to express ranges is via the strata that load on each factor. The early factor draws on cohorts with medians around 3 to 5.7 years, while the later factor draws on cohorts with medians around 14.6 to 22.7 years. These medians are accompanied by median absolute deviations on the order of one to seven years depending on cohort. That means there is spread around each median, but again, exact minima and maxima per factor are not enumerated.
Critical evaluation. The developmental model gains support from three converging lines of evidence. 1) longitudinal SDQ trajectories split into two groups that align with earlier versus later diagnosis. 2) common variant heritability of diagnosis age persists after adjustment for measured clinical and demographic covariates, which is incompatible with a purely unitary severity or access story. 3) cross GWAS genetic structure is better captured by two correlated latent factors than by one. That is elegant, but we should keep limits in view. The SDQ is a broad behavioral instrument rather than an autism core features instrument, which means trajectory groups reflect general difficulties rather than specific social communication or restricted repetitive behaviors, and some nuance could be missed.
The genetic modeling is predominantly in European ancestry cohorts, so portability to other ancestries remains unknown. There are strong signals of correlation between later diagnosis and ADHD and other mental health traits, but causal interpretation is tricky because diagnostic overshadowing and health care pathways can generate later diagnosis without genetics being the sole driver. The authors acknowledge this and even note that cultural and structural factors, gender bias, ethnicity, access to care, and camouflaging likely shape who is recognized and when. That is aligned with your observation that women and people of color who mask can slip through the cracks and bear a heavy mental health burden. The study’s later factor is associated with higher rates of mental health problems, which fits the epidemiology and offers a partial genetic account without denying lived context.
What does it mean, in lived terms, that there are two factors. It does not mean there are two kinds of autism. It means that across studies, the common variant patterns that distinguish autistic cases from controls are not homogeneous and that a parsimonious way to summarize this heterogeneity is to say there is an early leaning configuration and a later leaning configuration that share signal but are not the same. The early configuration is tied to lower social communication abilities in early childhood and only moderate genetic overlap with ADHD and adult psychiatric traits. The later configuration is tied to a rise in socioemotional and behavioral difficulties during adolescence and stronger genetic overlap with ADHD and several adult mental health conditions.
The most important layer of interpretation is equity. The authors are careful to say that age at diagnosis is not only a function of genetics or even of development. It is also shaped by who is expected to look autistic, by who is granted attention, and by who can maintain camouflage and at what personal cost. Women and people of color are more likely to be missed, misread, or delayed because of stereotype mismatch, structural barriers, and the demands of masking. The 2-factor result does not deny this reality. It sits alongside it. A fair reading is that timing reflects a composite of polygenic background, developmental course, and social sorting. The data tell us that later recognition often has a distinct developmental arc and a different polygenic profile on average. The stories of why recognition arrives at 22 or 50 are still individual and still embedded in the world we live in.
On inclusivity and clinical implications. The authors explicitly caution that their findings sit alongside unmeasured forces, including bias and camouflaging, and that age at diagnosis is shaped by place and time. That is a scientific way of saying what we late diagnosed autistic people already know in our bones. The model here offers one axis that helps explain heterogeneity and could eventually guide tailored supports across developmental windows. It does not close the book on why many adults, especially those who mask or face inequities, only receive recognition in middle age.
My final word: This study nails the point that timing is partly genetic and developmentally patterned, and it also shows, by design, that the social machinery that delays recognition for many women and people of color in their 30s, 40s, and 50s and even 60s remains largely outside the frame.

