Cohort and Data
FinnGen uses biobank samples and health register data
FinnGen has collected and correlated genome and longitudinal health data of more than 500 000 Finns, which is almost 10% of the Finnish population. FinnGen has used samples from existing biobank collections and received new samples from biobanks to produce genotype data. Data from health registers has been used to define clinical endpoints. FinnGen data therefore allows the study of associations between phenotypes and genotypes. In addition to imputed genotype data, NGS data is available for some of the individuals. The cohort is intentionally enriched with disease cases, as the samples have been collected from hospital biobanks. A subset of samples collected through the Finnish Red Cross Blood Service Biobank represent healthy individuals.
The final FinnGen cohort consists of over 500 000 individuals
The combined amount of the legacy samples and newly collected samples is 520 0000. The median age of the participants when donating was 53 years and 43% are men, and 57% women.

FinnGen Data
FinnGen is a biobank study
This means that the study has not recruited the participants for this specific research project, but the samples FinnGen utilises have been collected by the Finnish biobanks.
Clinical endpoints
During FinnGen, a significant effort has been put into creating meaningful clinical endpoints based on the digital health record data from Finnish health registries.
Genetic data
Genome variant data from most of the samples has been produced using a customised genotyping chip with about 700 000 markers combined with imputation. NGS data is available for some of the individuals.
Health register data
Most of the phenotype data in FinnGen comes from the national health registers covering the entire lifespan of the study subjects. This covers data from more than 10 registers.
FinnGen is expanding to other -omics and clinical data
Through the expansion areas of FinnGen 2 and during FinnGen 3 (in 2023-2027), the data resource will be expanded to include other omics data (proteomics, metabolomics and single cell ATAC sequencing), clinical data and laboratory values from the national KANTA register.