Methodology Research

  1. A Sonabend, AM Pellegrini, S Chan, HE Brown, JN Rosenquist, PJ Vuijk, AE Doyle, RH Perlis, T Cai. Integrating questionnaire measures for transdiagnostic psychiatric phenotyping using word2vec. PLOS ONE 2020;15(4):e0230663.
  2. C Hong, Y Wang, T Cai. A divide-and-conquer method for sparse risk prediction and evaluation. Biostatistics 2020.
  3. L Parast, T Cai, L Tian. Evaluating multiple surrogate markers with censored data. Biometrics 2020.
  4. S Chan, X Wang, I Jazi{'{c}}, S Peskoe, Y Zheng, T Cai. Developing and evaluating risk prediction models with panel current status data. Biometrics 2020.
  5. Y Ahuja, D Zhou, Z He, J Sun, VM Castro, V Gainer, SN Murphy, C Hong, T Cai. sureLDA: A multidisease automated phenotyping method for the electronic health record. Journal of the American Medical Informatics Association 2020;27(8):1235–1243.
  6. BP Hejblum, GM Weber, KP Liao, NP Palmer, S Churchill, NA Shadick, P Szolovits, SN Murphy, IS Kohane, T Cai. Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes. Scientific Data 2019;6(1).
  7. C Hong, KP Liao, T Cai. Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping. Biometrics 2019;75(1):78–89.
  8. D Cheng, A Chakrabortty, AN Ananthakrishnan, T Cai. Estimating average treatment effects with a double-index propensity score. Biometrics 2019.
  9. J Gronsbell, J Minnier, S Yu, K Liao, T Cai. Automated feature selection of predictors in electronic medical records data. Biometrics 2019;75(1):268–277.
  10. KP Liao, J Sun, TA Cai, N Link, C Hong, J Huang, JE Huffman, J Gronsbell, Y Zhang, Y Ho, V Castro, V Gainer, SN Murphy, CJ O’Donnell, JM Gaziano, K Cho, P Szolovits, IS Kohane, S Yu, T Cai. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. Journal of the American Medical Informatics Association 2019;26(11):1255–1262.
  11. L Parast, L Tian, T Cai. Assessing the value of a censored surrogate outcome. Lifetime Data Analysis 2019;26(2):245–265.
  12. L Parast, T Cai, L Tian. Using a surrogate marker for early testing of a treatment effect. Biometrics 2019;75(4):1253–1263.
  13. L Zhang, Y Zhang, T Cai, Y Ahuja, Z He, Y Ho, A Beam, K Cho, R Carroll, J Denny, I Kohane, K Liao, T Cai. Automated grouping of medical codes via multiview banded spectral clustering. Journal of Biomedical Informatics 2019;100:103322.
  14. SF Chan, BP Hejblum, A Chakrabortty, T Cai. Semi-supervised estimation of covariance with application to phenome-wide association studies with electronic medical records data. Statistical Methods in Medical Research 2019;29(2):455–465.
  15. T Cai, TT Cai, K Liao, W Liu. Large-Scale Simultaneous Testing of Cross-Covariance Matrices with Applications to PheWAS. Statistica Sinica 2019;29:983–1005.
  16. W Ning, S Chan, A Beam, M Yu, A Geva, K Liao, M Mullen, KD Mandl, I Kohane, T Cai, S Yu. Feature extraction for phenotyping from semantic and knowledge resources. Journal of Biomedical Informatics 2019;91:103122.
  17. X Wang, L Parast, L Tian, T Cai. Model-free approach to quantifying the proportion of treatment effect explained by a surrogate marker. Biometrika 2019;107(1):107–122.
  18. Y Wang, C Hong, N Palmer, Q Di, J Schwartz, I Kohane, T Cai. A fast divide-and-conquer sparse Cox regression. Biostatistics 2019.
  19. Y Zhang*, T Cai*, S Yu*, K Cho, C Hong, J Sun, J Huang, Y Ho, AN Ananthakrishnan, Z Xia, SY Shaw, V Gainer, V Castro, N Link, J Honerlaw, S Huang, D Gagnon, EW Karlson, RM Plenge, P Szolovits, G Savova, S Churchill, C O’Donnell, SN Murphy, JM Gaziano, I Kohane, T Cai*, KP Liao*. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nature Protocols 2019;14(12):3426–3444. (*: contributed equally to the work)
  20. A Chakrabortty, T Cai*. Efficient and adaptive linear regression in semi-supervised settings. The Annals of Statistics 2018;46(4):1541–1572. (*: a PhD student thesis paper)
  21. JA Sinnott, T Cai*. Pathway aggregation for survival prediction via multiple kernel learning. Statistics in Medicine 2018;37(16):2501–2515. (*: a PhD student thesis paper)
  22. M Maziarz, T Cai, L Qi, AS Lok, Y Zheng. Evaluating longitudinal markers under two-phase study designs. Biostatistics 2018;20(3):485–498.
  23. MO Goodman, L Chibnik, T Cai. Variance components genetic association test for zero-inflated count outcomes. Genetic Epidemiology 2018;43(1):82–101.
  24. Y Xia, T Cai, TT Cai. Two-Sample Tests for High-Dimensional Linear Regression With an Application to Detecting Interactions. Statistica Sinica 2018;28:63–92.
  25. D Agniel, T Cai*. Analysis of Multiple Diverse Phenotypes via Semiparametric Canonical Correlation Analysis. Biometrics 2017;73(4):1254–1265. (*: a PhD student thesis paper)
  26. D Liu, T Cai, A Lok, Y Zheng. Nonparametric Maximum Likelihood Estimators of Time-Dependent Accuracy Measures for Survival Outcome Under Two-Stage Sampling Designs. Journal of the American Statistical Association 2017;113(522):882–892.
  27. JL Gronsbell, T Cai*. Semi-supervised approaches to efficient evaluation of model prediction performance. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2017;80(3):579–594. (*: a PhD student thesis paper)
  28. L Parast, T Cai, L Tian. Evaluating surrogate marker information using censored data. Statistics in Medicine 2017;36(11):1767–1782.
  29. QM Zhou, W Dai, Y Zheng, T Cai. Robust dynamic risk prediction with longitudinal studies. Statistical Theory and Related Fields 2017;1(2):159–170.
  30. S Yu, Y Ma, J Gronsbell, T Cai, AN Ananthakrishnan, VS Gainer, SE Churchill, P Szolovits, SN Murphy, IS Kohane, KP Liao, T Cai. Enabling phenotypic big data with PheNorm. Journal of the American Medical Informatics Association 2017;25(1):54–60.
  31. W Dai, M Yang, C Wang, T Cai*. Sequence robust association test for familial data. Biometrics 2017;73(3):876–884. (*: a PhD student thesis paper)
  32. Y Xia, T Cai, TT Cai. Multiple Testing of Submatrices of a Precision Matrix With Applications to Identification of Between Pathway Interactions. Journal of the American Statistical Association 2017;113(521):328–339.
  33. Y Zheng, M Brown, A Lok, T Cai. Improving efficiency in biomarker incremental value evaluation under two-phase designs. The Annals of Applied Statistics 2017;11(2):638–654.
  34. Y Zheng, T Cai*. Augmented estimation for t-year survival with censored regression models. Biometrics 2017;73(4):1169–1178. (*: a PhD student thesis paper)
  35. D Agniel, KP Liao, T Cai*. Estimation and testing for multiple regulation of multivariate mixed outcomes. Biometrics 2016;72(4):1194–1205. (*: a PhD student thesis paper)
  36. FH Yong, L Tian, S Yu, T Cai, LJ Wei. Optimal stratification in outcome prediction using baseline information. Biometrika 2016;103(4):817–828.
  37. M Maziarz, P Heagerty, T Cai, Y Zheng. On longitudinal prediction with time-to-event outcome: Comparison of modeling options. Biometrics 2016;73(1):83–93.
  38. M Neykov, JS Liu, T Cai*. L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research 2016;17(87):1-37. (*: a PhD student thesis paper)
  39. M Neykov, JS Liu, T Cai*. On the Characterization of a Class of Fisher-Consistent Loss Functions and its Application to Boosting. Journal of Machine Learning Research 2016;17(70):1-32. (*: a PhD student thesis paper)
  40. Q He, T Cai, Y Liu, N Zhao, QE Harmon, LM Almli, EB Binder, SM Engel, KJ Ressler, KN Conneely, X Lin, MC Wu. Prioritizing individual genetic variants after kernel machine testing using variable selection. Genetic Epidemiology 2016;40(8):722–731.
  41. R Payne, M Yang, Y Zheng, MK Jensen, T Cai*. Robust risk prediction with biomarkers under two-phase stratified cohort design. Biometrics 2016;72(4):1037–1045. (*: a PhD student thesis paper)
  42. S Yu, A Chakrabortty, KP Liao, T Cai, AN Ananthakrishnan, VS Gainer, SE Churchill, P Szolovits, SN Murphy, IS Kohane, T Cai. Surrogate-assisted feature extraction for high-throughput phenotyping. Journal of the American Medical Informatics Association 2016;ocw135.
  43. T Cai, TT Cai, A Zhang. Structured Matrix Completion with Applications to Genomic Data Integration. Journal of the American Statistical Association 2016;111(514):621–633.
  44. X Wang, Z Zhang, N Morris, T Cai, S Lee, C Wang, TW Yu, CA Walsh, X Lin. Rare variant association test in family-based sequencing studies. Briefings in Bioinformatics 2016;bbw083.
  45. Y Huang, T Cai, E Kim. Integrative genomic testing of cancer survival using semiparametric linear transformation models. Statistics in Medicine 2016;35(16):2831–2844.
  46. Y Shen, T Cai*. Identifying predictive markers for personalized treatment selection. Biometrics 2016;72(4):1017–1025. (*: a PhD student thesis paper, winner of the John Van Ryzin Award)
  47. J Minnier, M Yuan, JS Liu, T Cai*. Risk Classification With an Adaptive Naive Bayes Kernel Machine Model. Journal of the American Statistical Association 2015;110(509):393–404. (*: a PhD student thesis paper)
  48. QM Zhou, Y Zheng, LB Chibnik, EW Karlson, T Cai. Assessing incremental value of biomarkers with multi-phase nested case-control studies. Biometrics 2015;71(4):1139–1149.
  49. S Yu, KP Liao, SY Shaw, VS Gainer, SE Churchill, P Szolovits, SN Murphy, IS Kohane, T Cai. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. Journal of the American Medical Informatics Association 2015;22(5):993–1000.
  50. Y Shen, T Cai, Y Chen, Y Yang, J Chen. Retrospective likelihood-based methods for analyzing case-cohort genetic association studies. Biometrics 2015;71(4):960–968.
  51. Y Xia, T Cai, TT Cai. Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 2015;102(2):247–266.
  52. JA Sinnott, W Dai, KP Liao, SY Shaw, AN Ananthakrishnan, VS Gainer, EW Karlson, S Churchill, P Szolovits, S Murphy, I Kohane, R Plenge, T Cai. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records. Human Genetics 2014;133(11):1369–1382.
  53. L Parast, L Tian, T Cai*. Landmark Estimation of Survival and Treatment Effect in a Randomized Clinical Trial. Journal of the American Statistical Association 2014;109(505):384–394. (*: a PhD student thesis paper)
  54. RA Matsouaka, J Li, T Cai*. Evaluating marker-guided treatment selection strategies. Biometrics 2014;70(3):489–499. (*: a PhD student thesis paper)
  55. Y Shen, KP Liao, T Cai*. Sparse kernel machine regression for ordinal outcomes. Biometrics 2014;71(1):63–70. (*: a PhD student thesis paper)
  56. JA Sinnott, T Cai*. Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling. Biometrics 2013;69(4):861–873. (*: a PhD student thesis paper)
  57. L Parast, T Cai*. Landmark risk prediction of residual life for breast cancer survival. Statistics in Medicine 2013;32(20):3459–3471. (*: a PhD student thesis paper)
  58. L Zhao, L Tian, T Cai, B Claggett, LJ Wei. Effectively Selecting a Target Population for a Future Comparative Study. Journal of the American Statistical Association 2013;108(502):527–539.
  59. QM Zhou, Y Zheng, T Cai. Assessment of biomarkers for risk prediction with nested case-control studies. Clinical Trials: Journal of the Society for Clinical Trials 2013;10(5):677–679.
  60. T Cai, L Tian, D Lloyd-Jones, LJ Wei. Evaluating subject-level incremental values of new markers for risk classification rule. Lifetime Data Analysis 2013;19(4):547–567.
  61. T Cai, Y Zheng. Resampling Procedures for Making Inference Under Nested Case-Control Studies. Journal of the American Statistical Association 2013;108(504):1532–1544.
  62. Y Zheng, T Cai, MS Pepe. Adopting nested case-control quota sampling designs for the evaluation of risk markers. Lifetime Data Analysis 2013;19(4):568–588.
  63. D Liu, T Cai, Y Zheng. Evaluating the Predictive Value of Biomarkers with Stratified Case-Cohort Design. Biometrics 2012;68(4):1219–1227.
  64. H Uno, L Tian, T Cai, IS Kohane, LJ Wei. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Statistics in Medicine 2012;32(14):2430–2442.
  65. L Leon, T Cai*. Model checking techniques for assessing functional form specifications in censored linear regression models. Statistica Sinica 2012;22(2). (*: a PhD student thesis paper)
  66. L Parast, S Cheng, T Cai*. Landmark Prediction of Long-Term Survival Incorporating Short-Term Event Time Information. Journal of the American Statistical Association 2012;107(500):1492–1501. (*: a PhD student thesis paper)
  67. L Tian, T Cai, L Zhao, LJ Wei. On the covariate-adjusted estimation for an overall treatment difference with data from a randomized comparative clinical trial. Biostatistics 2012;13(2):256–273.
  68. QM Zhou, Y Zheng, T Cai. Subgroup specific incremental value of new markers for risk prediction. Lifetime Data Analysis 2012;19(2):142–169.
  69. T Cai, X Lin, RJ Carroll. Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test. Biostatistics 2012;13(4):776–790.
  70. Y Zheng, L Parast, T Cai, M Brown. Evaluating incremental values from new predictors with net reclassification improvement in survival analysis. Lifetime Data Analysis 2012;19(3):350–370.
  71. H Uno, T Cai, L Tian, LJ Wei. Graphical Procedures for Evaluating Overall and Subject-Specific Incremental Values from New Predictors with Censored Event Time Data. Biometrics 2011;67(4):1389–1396.
  72. H Uno, T Cai, MJ Pencina, RB D’Agostino, LJ Wei. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 2011;30(10):1105–1117.
  73. J Minnier, L Tian, T Cai*. A Perturbation Method for Inference on Regularized Regression Estimates. Journal of the American Statistical Association 2011;106(496):1371–1382. (*: a PhD student thesis paper)
  74. L Parast, S Cheng, T Cai*. Incorporating short-term outcome information to predict long-term survival with discrete markers. Biometrical Journal 2011;53(2):294–307. (*: a PhD student thesis paper)
  75. MC Wu, S Lee, T Cai, Y Li, M Boehnke, X Lin. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. The American Journal of Human Genetics 2011;89(1):82–93.
  76. T Cai, G Tonini, X Lin. Kernel Machine Approach to Testing the Significance of Multiple Genetic Markers for Risk Prediction. Biometrics 2011;67(3):975–986.
  77. T Cai, L Tian, DM Lloyd-Jones. Comparing costs associated with risk stratification rules for t-year survival. Biostatistics 2011;12(4):597–609.
  78. T Cai, Y Zheng. Evaluating prognostic accuracy of biomarkers in nested case-control studies. Biostatistics 2011;13(1):89–100.
  79. T Cai, Y Zheng. Nonparametric Evaluation of Biomarker Accuracy Under Nested Case-Control Studies. Journal of the American Statistical Association 2011;106(494):569–580.
  80. X Lin, T Cai, MC Wu, Q Zhou, G Liu, DC Christiani, X Lin. Kernel machine SNP-set analysis for censored survival outcomes in genome-wide association studies. Genetic Epidemiology 2011;35(7):620–631.
  81. Y Zheng, T Cai, Y Jin, Z Feng. Evaluating Prognostic Accuracy of Biomarkers under Competing Risk. Biometrics 2011;68(2):388–396.
  82. L Tian, R Wang, T Cai, L Wei. The Highest Confidence Density Region and Its Usage for Joint Inferences about Constrained Parameters. Biometrics 2010;67(2):604–610.
  83. R Wang, L Tian, T Cai, LJ Wei. Nonparametric inference procedure for percentiles of the random effects distribution in meta-analysis. The Annals of Applied Statistics 2010;4(1):520–532.
  84. S McDaniel, J Minnier, RA Betensky, G Mohapatra, Y Shen, JF Gusella, DN Louis, T Cai*. Assessing Population Level Genetic Instability via Moving Average. Statistics in Biosciences 2010;2(2):120–136. (*: a PhD student thesis paper)
  85. T Cai, L Parast, L Ryan. Meta-analysis for rare events. Statistics in Medicine 2010;29(20):2078–2089.
  86. T Cai, L Tian, H Uno, SD Solomon, LJ Wei. Calibrating parametric subject-specific risk estimation. Biometrika 2010;97(2):389–404.
  87. T Cai, L Tian, PH Wong, LJ Wei. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 2010;12(2):270–282.
  88. T Cai, TA Gerds, Y Zheng, J Chen. Robust Prediction of t-Year Survival with Data from Multiple Studies. Biometrics 2010;67(2):436–444.
  89. LF León, T Cai, LJ Wei. Robust Inferences for Covariate Effects on Survival Time with Censored Linear Regression Models. Statistics in Biosciences 2009;1:50–64.
  90. Y Zheng, T Cai, JL Stanford, Z Feng. Semiparametric Models of Time-Dependent Predictive Values of Prognostic Biomarkers. Biometrics 2009;66(1):50–60.
  91. B Jiang, X Zhang, T Cai. Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers. Journal of Machine Learning Research 2008;9:521–540.
  92. L Tian, T Cai, LJ Wei. Identifying Subjects Who Benefit from Additional Information for Better Prediction of the Outcome Variables. Biometrics 2008;65(3):894–902.
  93. L Tian, T Cai, MA Pfeffer, N Piankov, PY Cremieux, LJ Wei. Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction. Biostatistics 2008;10(2):275–281.
  94. T Cai, J Huang, L Tian. Regularized Estimation for the Accelerated Failure Time Model. Biometrics 2008;65(2):394–404.
  95. T Cai, L Tian, SD Solomon, LJ Wei. Predicting future responses based on possibly mis-specified working models. Biometrika 2008;95(1):75–92.
  96. T Cai, LE Dodd. Regression analysis for the partial area under the ROC curve. Statistica Sinica 2008;18(3):817–836.
  97. TA Gerds, T Cai, M Schumacher. The Performance of Risk Prediction Models. Biometrical Journal 2008;50(4):457–479.
  98. Y Zheng, T Cai, MS Pepe, WC Levy. Time-Dependent Predictive Values of Prognostic Biomarkers With Failure Time Outcome. Journal of the American Statistical Association 2008;103(481):362–368.
  99. H Uno, T Cai, L Tian, LJ Wei. Evaluating Prediction Rules fort-Year Survivors With Censored Regression Models. Journal of the American Statistical Association 2007;102(478):527–537.
  100. JH Ware, T Cai. Comments on `Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by M. J. Pencinaet al., Statistics in Medicine (DOI: 10.1002/sim.2929). Statistics in Medicine 2007;27(2):185–187.
  101. L Tian, T Cai, E Goetghebeur, LJ Wei. Model evaluation based on the sampling distribution of estimated absolute prediction error. Biometrika 2007;94(2):297–311.
  102. T Cai, S Cheng. Robust combination of multiple diagnostic tests for classifying censored event times. Biostatistics 2007;9(2):216–233.
  103. T Cai, Y Zheng. Model Checking for ROC Regression Analysis. Biometrics 2007;63(1):152–163.
  104. L Tian, T Cai. On the accelerated failure time model for current status and interval censored data. Biometrika 2006;93(2):329–342.
  105. T Cai, PB Gilbert, SG Self. Joint Inferences on Vaccine Efficacy Against Infection and Disease with Application to the First HIV Vaccine Efficacy Trial. Journal of Biopharmaceutical Statistics 2006;16(4):517–538.
  106. MS Pepe, T Cai, G Longton. Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve. Biometrics 2005;62(1):221–229.
  107. T Cai. The sensitivity and specificity of markers for event times. Biostatistics 2005;7(2):182–197.
  108. T Cai, L Tian, LJ Wei. Semiparametric Box-Cox power transformation models for censored survival observations. Biometrika 2005;92(3):619–632.
  109. Y Zheng, T Cai, Z Feng. Application of the Time-Dependent ROC Curves for Prognostic Accuracy with Multiple Biomarkers. Biometrics 2005;62(1):279–287.
  110. MS Pepe, T Cai. The Analysis of Placement Values for Evaluating Discriminatory Measures. Biometrics 2004;60(2):528–535.
  111. T Cai. Semi-parametric ROC regression analysis with placement values. Biostatistics 2004;5(1):45–60.
  112. T Cai. Semiparametric regression analysis for doubly censored data. Biometrika 2004;91(2):277–290.
  113. T Cai, CS Moskowitz. Semi-parametric estimation of the binormal ROC curve for a continuous diagnostic test. Biostatistics 2004;5(4):573–586.
  114. T Cai, RA Betensky. Hazard Regression for Interval-Censored Data with Penalized Spline. Biometrics 2003;59(3):570–579.
  115. T Cai, MS Pepe. Semiparametric Receiver Operating Characteristic Analysis to Evaluate Biomarkers for Disease. Journal of the American Statistical Association 2002;97(460):1099–1107.
  116. T Cai, RJ Hyndman, MP Wand. Mixed Model-Based Hazard Estimation. Journal of Computational and Graphical Statistics 2002;11(4):784–798.
  117. T Cai, SC Cheng, LJ Wei. Semiparametric Mixed-Effects Models for Clustered Failure Time Data. Journal of the American Statistical Association 2002;97(458):514–522.
  118. T Cai, LJ Wei. Regression Analysis for Multivariate Failure Time Observations. Asymptotics in Statistics and Probability 2000;33–46.
  119. T Cai, LJ Wei, M Wilcox. Semiparametric regression analysis for clustered failure time data. Biometrika 2000;87(4):867–878.
  120. T Cai, J Shen. Boundedness Is Redundant in a Theorem of Daubechies. Applied and Computational Harmonic Analysis 1999;6(3):400–404.

© Tianxi Cai, Harvard University