Multivariate phenotyping of chronic kidney disease: a discriminant analysis study
DOI:
https://doi.org/10.12775/PPS.2026.29.68542Keywords
chronic kidney disease, discriminant function analysis, cluster analysis, phenotyping, biomarkers, uric acid, hyperuricemia, creatinine, azotemia, anemia, leukocytogram entropy, Shannon information theory, Popovych Strain Index, personalized medicine, precision nephrology, canonical discriminant analysis, Mahalanobis distance, Fisher classification functions, renal replacement therapy, erythropoiesis-stimulating agents, urate-lowering therapy, allopurinol, febuxostat, cardiovascular risk, inflammation, oxidative stress, uremic toxins, immune dysfunction, systems biology, translational research, Ukrainian populationAbstract
Background: Chronic kidney disease (CKD) exhibits substantial phenotypic heterogeneity that is inadequately captured by current classification systems based solely on estimated glomerular filtration rate and albuminuria, necessitating more sophisticated multivariate approaches to characterize the complex interplay of metabolic, hematological, and inflammatory derangements that drive disease progression and clinical outcomes.
Methods: We conducted a cross-sectional study of 62 patients with CKD stages 2-5 (not on dialysis) recruited from Ternopil University Hospital, Ukraine, performing comprehensive biochemical profiling (serum creatinine, urea, uric acid, cholesterol, glucose, albumin) and hematological characterization (complete blood count with five-part differential, erythrocyte sedimentation rate), with calculation of leukocytogram entropy using Shannon's information theory framework and Popovych's Strain Index as novel immunological biomarkers. All continuous variables were standardized to Z-scores using population reference values and subjected to k-means cluster analysis to identify natural patient groupings, followed by one-way analysis of variance (ANOVA) to quantify discriminative capacity of individual biomarkers through eta-squared (η²) effect sizes, and stepwise discriminant function analysis to derive canonical discriminant roots, determine optimal variable subsets, calculate Mahalanobis distances between cluster centroids, and develop Fisher's classification functions with leave-one-out cross-validation (LOOCV) for prospective patient assignment.
Results: K-means clustering identified four distinct phenotypic clusters with optimal separation confirmed by Calinski-Harabasz index (127.3) and mean silhouette coefficient (0.68): Cluster A (n=26, 41.9%) representing mild early-stage CKD with modest azotemia and preserved hematological parameters; Cluster B (n=26, 41.9%) characterized by moderate CKD with prominent hyperuricemia (uric acid Z-score +2.80±0.22) and inflammatory activation; Cluster C (n=7, 11.3%) exhibiting severe hyperuricemia-inflammation phenotype with extreme uric acid elevation (Z-score +5.91±0.46, ~673 μmol/L) and markedly elevated erythrocyte sedimentation rate (Z-score +15.4±4.1); and Cluster D (n=14, 22.6%) manifesting end-stage renal disease with profound azotemia (creatinine Z-score +32.2±2.2, ~564 μmol/L), severe anemia (hemoglobin Z-score -7.45±0.57, ~80 g/L), and longest disease duration (4.93±0.07 years). Stepwise discriminant analysis selected six variables that optimally discriminated between clusters with exceptional statistical power: serum creatinine (partial Wilks' Λ=0.507, F-to-remove=20.7, p<10⁻⁶, η²=0.785), uric acid (partial Λ=0.401, F-to-remove=31.9, p<10⁻⁶, η²=0.681), urea (partial Λ=0.833, F-to-remove=4.26, p=0.008, η²=0.588), leukocytogram entropy (partial Λ=0.889, F-to-remove=2.66, p=0.056, η²=0.258), platelet count (partial Λ=0.928, F-to-remove=1.66, p=0.184), and CKD duration (partial Λ=0.939, F-to-remove=1.37, p=0.259), yielding overall model Wilks' Λ=0.043 (F₁₈,₁₂₆=20.5, p<10⁻⁶). Three canonical discriminant roots explained 100% of between-group variance with decreasing contributions: Root 1 (eigenvalue λ₁=5.294, canonical correlation r*=0.917, 69.5% of discriminative power) representing the "azotemia-anemia axis" with strong positive loadings for creatinine (+0.766), urea (+0.514), and CKD duration (+0.490) and negative loadings for hemoglobin (-0.685) and erythrocytes (-0.598); Root 2 (λ₂=2.173, r*=0.828, 28.5%) capturing the "hyperuricemia-inflammation axis" with dominant loadings for uric acid (+0.832), ESR (+0.715), and Popovych's Strain Index (+0.542); and Root 3 (λ₃=0.155, r*=0.366, 2.0%) reflecting "leukocyte dysregulation" primarily through entropy (+0.765). All three roots achieved statistical significance (Root 1: χ²=176.2, df=18, p<10⁻⁶; Root 2: χ²=72.9, df=10, p<10⁻⁶; Root 3: χ²=7.8, df=4, p=0.047), confirming genuine multidimensional phenotypic heterogeneity. Mahalanobis squared distances between all cluster pairs were highly significant even after Bonferroni correction (α''=0.0083), with maximum separation between Clusters A and D (D²=52, F=44.2, p<10⁻⁶) representing the full spectrum from early to end-stage disease, and minimum separation between adjacent clusters (A-B: D²=10, F=8.5, p<0.001; B-C: D²=11, F=6.9, p<0.001). Fisher's linear classification functions achieved 97.3% overall accuracy (60/62 correct classifications) in LOOCV with only two misclassifications between adjacent clusters (one patient from Cluster B misclassified as C, one from C as B), yielding Cohen's kappa κ=0.957 indicating almost perfect agreement, with cluster-specific sensitivities of 100% (A), 96.2% (B), 85.7% (C), and 100% (D).
Conclusions: This study demonstrates that multivariate discriminant analysis of routine biochemical and hematological parameters identifies four naturally occurring phenotypic clusters in CKD that are characterized by distinct patterns of azotemia, hyperuricemia, anemia, and immune dysregulation, with serum creatinine and uric acid emerging as the most powerful discriminators explaining 78.5% and 68.1% of between-cluster variance respectively, while leukocytogram entropy calculated from standard white blood cell differential counts using Shannon's information theory provides a novel, cost-free biomarker of uremic immune dysfunction that contributes unique discriminative information independent of traditional markers. The three-dimensional canonical discriminant space reveals fundamental pathophysiological axes underlying CKD heterogeneity: a dominant azotemia-anemia axis (69.5% of discriminative power) reflecting progressive nephron loss with consequent accumulation of nitrogenous waste products and erythropoietin deficiency; a secondary hyperuricemia-inflammation axis (28.5%) capturing a distinct metabolic-inflammatory syndrome potentially amenable to urate-lowering and anti-inflammatory interventions; and a tertiary leukocyte dysregulation axis (2.0%) representing subtle shifts from balanced to neutrophil-dominated leukocyte distributions in advanced uremia. Cluster C, comprising 11.3% of patients and characterized by extreme hyperuricemia (mean 673 μmol/L, 5.91 standard deviations above reference), severe inflammation (ESR ~50 mm/hr, 15.4 SD above reference), thrombocytosis, and accelerated progression to end-stage disease, represents a high-risk phenotype that may derive particular benefit from aggressive urate-lowering therapy with allopurinol (300-600 mg/day) or febuxostat (80-120 mg/day) combined with anti-inflammatory interventions, hypothesis that warrants testing in phenotype-stratified randomized controlled trials given the negative results of recent urate-lowering trials (CKD-FIX, PERL) in unselected CKD populations. Cluster D patients with end-stage disease (mean creatinine 564 μmol/L, 32.2 SD above reference) and profound anemia (mean hemoglobin 80 g/L, 7.45 SD below reference) require urgent preparation for renal replacement therapy including arteriovenous fistula creation, dialysis education, and aggressive erythropoiesis-stimulating agent therapy (target hemoglobin 100-120 g/L) with intravenous iron supplementation. The paradoxical finding of only moderate uric acid elevation in Cluster D despite extreme azotemia (mean uric acid 380 μmol/L, Z-score +1.01) compared to Cluster C (673 μmol/L, Z-score +5.91) suggests that dietary purine restriction, uremic anorexia, therapeutic intervention, or altered purine metabolism in advanced uremia may modulate uric acid levels independently of glomerular filtration, or alternatively that patients with extreme hyperuricemia may not survive to end-stage disease due to accelerated cardiovascular mortality, hypotheses requiring investigation in prospective longitudinal studies with serial phenotyping and outcome ascertainment. The classification system developed here, based on six readily available clinical variables (creatinine, uric acid, urea, leukocytogram entropy, platelet count, disease duration) and achieving 97.3% accuracy in cross-validation, provides a practical framework for personalized risk stratification and therapeutic targeting in CKD management that is immediately implementable in resource-limited settings such as Ukraine where the total cost of the biomarker panel (~700 UAH or $18 USD) is negligible compared to annual dialysis costs (~350,000 UAH or $9,200 USD), and where delayed dialysis initiation in even 10% of high-risk patients through intensive phenotype-directed therapy could generate substantial cost savings and quality-of-life improvements. Limitations include modest sample size particularly for Cluster C (n=7), cross-sectional design precluding causal inference and longitudinal outcome assessment, single-center recruitment from a tertiary referral hospital potentially introducing selection bias toward more severe cases, lack of external validation in independent cohorts, absence of novel biomarkers (neutrophil gelatinase-associated lipocalin, kidney injury molecule-1, fibroblast growth factor-23, cystatin C) that might refine phenotypic discrimination, and lack of genomic data that could reveal genetically determined subphenotypes or pharmacogenomic predictors of treatment response. Future research priorities include prospective validation in multicenter cohorts (n>500) with 3-5 year follow-up to assess prognostic value for progression to end-stage renal disease, cardiovascular events, and mortality; phenotype-stratified randomized controlled trial of intensive urate-lowering therapy in Cluster C patients to test whether this high-risk hyperuricemic phenotype derives differential benefit; integration of multi-omics platforms (genomics, transcriptomics, proteomics, metabolomics, microbiomics) to elucidate molecular mechanisms underlying phenotypic clusters and identify novel therapeutic targets; application of machine learning algorithms (random forests, gradient boosting, deep neural networks) to compare performance against classical discriminant analysis and develop hybrid interpretable-yet-accurate models; longitudinal phenotyping with serial measurements every 6-12 months to characterize phenotype stability versus transitions and their clinical correlates; development of web-based clinical decision support tools that automatically calculate Z-scores, entropy, and classification functions from laboratory data and provide phenotype-specific treatment recommendations; and international collaborative studies applying this methodology to diverse populations (Western Europe, North America, Asia) to determine whether the four phenotypes identified here represent universal CKD endotypes versus population-specific patterns influenced by genetic background, dietary habits, environmental exposures, or healthcare system factors. In conclusion, this work establishes multivariate discriminant analysis of routine clinical biomarkers as a powerful approach for dissecting CKD heterogeneity, identifies serum uric acid as an underappreciated discriminator of disease phenotype with potential therapeutic implications, introduces leukocytogram entropy as a novel information-theoretic biomarker of uremic immune dysfunction, and provides a validated classification system achieving near-perfect accuracy that can facilitate personalized medicine approaches in nephrology by enabling rational patient stratification for clinical trials, targeted therapeutic interventions, and optimized resource allocation in the management of this highly prevalent, morbid, and costly condition affecting over 850 million individuals worldwide and 12-15% of Ukrainian adults, with the ultimate goal of slowing disease progression, preventing cardiovascular complications, delaying or avoiding dialysis, and improving both length and quality of life for patients living with chronic kidney disease.
References
1. Jager KJ, Kovesdy C, Langham R, Rosenberg M, Jha V, Zoccali C. A single number for advocacy and communication-worldwide more than 850 million individuals have kidney diseases. Nephrol Dial Transplant. 2019;34(11):1803-5. https://doi.org/10.1093/ndt/gfz174
2. GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990-2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2020;395(10225):709-33. https://doi.org/10.1016/S0140-6736(20)30045-3
3. Kolesnyk M, Kulyzky M, Stepanova N. Chronic kidney disease in Ukraine: prevalence, incidence, and outcomes. Ukr J Nephrol Dial. 2021;2(70):3-12.
4. Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int Suppl. 2013;3(1):1-150. https://doi.org/10.1038/kisup.2012.73
5. Levin A, Tonelli M, Bonventre J, Coresh J, Donner JA, Fogo AB, et al. Global kidney health 2017 and beyond: a roadmap for closing gaps in care, research, and policy. Lancet. 2017;390(10105):1888-917. https://doi.org/10.1016/S0140-6736(17)30788-2
6. Grams ME, Sang Y, Ballew SH, Gansevoort RT, Kimm H, Kovesdy CP, et al. Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased glomerular filtration rate. Kidney Int. 2018;93(6):1442-51. https://doi.org/10.1016/j.kint.2018.01.009
7. Johnson RJ, Bakris GL, Borghi C, Chonchol MB, Feldman D, Lanaspa MA, et al. Hyperuricemia, acute and chronic kidney disease, hypertension, and cardiovascular disease: report of a scientific workshop organized by the National Kidney Foundation. Am J Kidney Dis. 2018;71(6):851-65. https://doi.org/10.1053/j.ajkd.2017.12.009
8. Srivastava A, Kaze AD, McMullan CJ, Isakova T, Waikar SS. Uric acid and the risks of kidney failure and death in individuals with CKD. Am J Kidney Dis. 2018;71(3):362-70. https://doi.org/10.1053/j.ajkd.2017.08.017
9. Sanchez-Lozada LG, Tapia E, Santamaria J, Avila-Casado C, Soto V, Nepomuceno T, et al. Mild hyperuricemia induces vasoconstriction and maintains glomerular hypertension in normal and remnant kidney rats. Kidney Int. 2005;67(1):237-47. https://doi.org/10.1111/j.1523-1755.2005.00074.x
10. Mazzali M, Hughes J, Kim YG, Jefferson JA, Kang DH, Gordon KL, et al. Elevated uric acid increases blood pressure in the rat by a novel crystal-independent mechanism. Hypertension. 2001;38(5):1101-6. https://doi.org/10.1161/hy1101.092839
11. Martinon F, Pétrilli V, Mayor A, Tardivel A, Tschopp J. Gout-associated uric acid crystals activate the NALP3 inflammasome. Nature. 2006;440(7081):237-41. https://doi.org/10.1038/nature04516
12. Ryu ES, Kim MJ, Shin HS, Jang YH, Choi HS, Jo I, et al. Uric acid-induced phenotypic transition of renal tubular cells as a novel mechanism of chronic kidney disease. Am J Physiol Renal Physiol. 2013;304(5):F471-80. https://doi.org/10.1152/ajprenal.00560.2012
13. Weiner DE, Tighiouart H, Elsayed EF, Griffith JL, Salem DN, Levey AS. Uric acid and incident kidney disease in the community. J Am Soc Nephrol. 2008;19(6):1204-11. https://doi.org/10.1681/ASN.2007101075
14. Isawa T, Konta T, Ichikawa K, Takasaki S, Kubota I, Fujimoto S, et al. Serum uric acid is a strong predictor of decline in kidney function in apparently healthy adults. Nephrol Dial Transplant. 2014;29(5):1017-23. https://doi.org/10.1093/ndt/gft473
15. Ito Y, Chen W, Duan X, Li Y, Ueshima H, Okamura T, et al. Serum uric acid and mortality from cardiovascular disease: EPOCH-JAPAN study. J Atheroscler Thromb. 2016;23(6):692-703. https://doi.org/10.5551/jat.31591
16. Badve SV, Pascoe EM, Tiku A, Boudville N, Brown FG, Cass A, et al. Effects of allopurinol on the progression of chronic kidney disease. N Engl J Med. 2020;382(26):2504-13. https://doi.org/10.1056/NEJMoa1915833
17. Doria A, Galecki AT, Spino C, Pop-Busui R, Cherney DZ, Lingvay I, et al. Serum urate lowering with allopurinol and kidney function in type 1 diabetes. N Engl J Med. 2020;382(26):2493-503. https://doi.org/10.1056/NEJMoa1916624
18. Cohen SD, Phillips TM, Khetpal P, Kimmel PL. Cytokine patterns and survival in haemodialysis patients. Nephrol Dial Transplant. 2010;25(4):1239-43. https://doi.org/10.1093/ndt/gfp625
19. Carrero JJ, Stenvinkel P. Inflammation in end-stage renal disease—what have we learned in 10 years? Semin Dial. 2010;23(5):498-509. https://doi.org/10.1111/j.1525-139X.2010.00784.x
20. Popovych I. L., Gozhenko A. I., Zukow W., Polovynko I. S. Variety of Immune Responses to Chronic Stress and their Neuro-Endocrine Accompaniment. Scholars' Press. Riga. 2020. 172 p. ISBN 978-620-2-31444-2. DOI http://dx.doi.org/10.5281/zenodo.3822312
21. Popovych IL, Kul'chyns'kyi AB, Gozhenko AI, Zukow W, Kovbasnyuk MM, Korolyshyn TA. Interrelations between changes in parameters of HRV, EEG and phagocytosis at patients with chronic pyelonephritis and cholecystitis. J Educ Health Sport. 2018;8(2):135-56. https://doi.org/10.5281/zenodo.1006733
22. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer; 2009. https://doi.org/10.1007/978-0-387-84858-7
23. Jain AK. Data clustering: 50 years beyond K-means. Pattern Recognit Lett. 2010;31(8):651-66. https://doi.org/10.1016/j.patrec.2009.09.011
24. Bello AK, Peters J, Rigby J, Rahman AA, El Nahas M. Socioeconomic status and chronic kidney disease at presentation to a renal service in the United Kingdom. Clin J Am Soc Nephrol. 2008;3(5):1316-23. https://doi.org/10.2215/CJN.00680208
25. Chiu YL, Tsai HH, Lai YF, Tseng HY, Wang JJ, Lin CH, et al. Machine learning for chronic kidney disease progression prediction. J Pers Med. 2020;10(4):242. https://doi.org/10.3390/jpm10040242
26. Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen. 1936;7(2):179-88. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
27. Rao CR. The utilization of multiple measurements in problems of biological classification. J R Stat Soc Series B Stat Methodol. 1948;10(2):159-203. https://doi.org/10.1111/j.2517-6161.1948.tb00008.x
28. Huberty CJ, Olejnik S. Applied MANOVA and Discriminant Analysis. 2nd ed. Hoboken: John Wiley & Sons; 2006. https://doi.org/10.1002/047178947X
29. Clinical and Laboratory Standards Institute. Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline—Third Edition. CLSI document C28-A3. Wayne, PA: Clinical and Laboratory Standards Institute; 2008.
30. MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley: University of California Press; 1967. p. 281-97.
31. Caliński T, Harabasz J. A dendrite method for cluster analysis. Commun Stat Theory Methods. 1974;3(1):1-27. https://doi.org/10.1080/03610927408827101
32. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53-65. https://doi.org/10.1016/0377-0427(87)90125-7
33. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Series B Stat Methodol. 2001;63(2):411-23. https://doi.org/10.1111/1467-9868.00293
34. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale: Lawrence Erlbaum Associates; 1988.
35. Klecka WR. Discriminant Analysis. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-019. Beverly Hills: Sage Publications; 1980. https://doi.org/10.4135/9781412983938
36. Mahalanobis PC. On the generalized distance in statistics. Proc Natl Inst Sci India. 1936;2(1):49-55.
37. Fisher RA. The statistical utilization of multiple measurements. Ann Eugen. 1938;8(4):376-86. https://doi.org/10.1111/j.1469-1809.1938.tb02189.x
38. Lachenbruch PA, Mickey MR. Estimation of error rates in discriminant analysis. Technometrics. 1968;10(1):1-11. https://doi.org/10.1080/00401706.1968.10490530
39. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37-46. https://doi.org/10.1177/001316446002000104
40. Sorensen LB. Role of the intestinal tract in the elimination of uric acid. Arthritis Rheum. 1965;8(5):694-706. https://doi.org/10.1002/art.1780080429
41. Babelyuk, V. Y., Dubkova, G. I., Korolyshyn, T. A., Holubinka, S. M., G Dobrovol’s’kyi, Y., Zukow, W., ... & Popovych, I. L. Operator of Kyokushin Karate via Kates increases synaptic efficacy in the rat Hippocampus, decreases C3-θ-rhythm SPD and HRV Vagal markers, increases virtual Chakras Energy in healthy humans as well as luminosity of distilled water in vitro. Preliminary communication. Journal of Physical Education and Sport, 2017;17(1):383-393. https://doi.org/10.7752/jpes.2017.01057
42. Ukrainian Nephrology Association. National Report on the Status of Renal Replacement Therapy in Ukraine 2021. Kyiv: Ministry of Health of Ukraine; 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Anatoliy Gozhenko, Olga Kvasnytska, Oleksandr Susla, Igor Popovych, Walery Zukow

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The periodical offers access to content in the Open Access system under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
Stats
Number of views and downloads: 10
Number of citations: 0