IPSIndian Journal of Pharmacology
Home  IPS  Feedback Subscribe Top cited articles Login 
Users Online : 1338 
Small font sizeDefault font sizeIncrease font size
Navigate Here
Resource Links
   Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
   Article in PDF (374 KB)
   Citation Manager
   Access Statistics
   Reader Comments
   Email Alert *
   Add to My List *
* Registration required (free)

In This Article
   Issues Related t...
   Issues Related t...

 Article Access Statistics
    PDF Downloaded91    
    Comments [Add]    
    Cited by others 1    

Recommend this journal


 Table of Contents    
Year : 2015  |  Volume : 47  |  Issue : 5  |  Page : 571-572

Errors in use of multivariable regression analysis

Department of Biostatistics and Medical Informatics, University College of Medical Sciences, New Delhi, India

Date of Web Publication15-Sep-2015

Correspondence Address:
Dr. Rajeev Kumar
Department of Biostatistics and Medical Informatics, University College of Medical Sciences, New Delhi
Login to access the Email id

Source of Support: Nil., Conflict of Interest: There are no conflicts of interest.

DOI: 10.4103/0253-7613.165187

Rights and Permissions

How to cite this article:
Kumar R. Errors in use of multivariable regression analysis. Indian J Pharmacol 2015;47:571-2

How to cite this URL:
Kumar R. Errors in use of multivariable regression analysis. Indian J Pharmacol [serial online] 2015 [cited 2023 Dec 9];47:571-2. Available from: https://www.ijp-online.com/text.asp?2015/47/5/571/165187


Hiremath and Kamdod published a retrospective study and applied multivariable (linear and logistic) regression analysis to find the association of change in MAP level, serum creatinine level and survival benefit with various risk factors.[1] I have some remarks regarding the application of multivariable regression methods in his study.

  Issues Related to Multivariable Logistic Regression Top

Thirty-one subjects were included in the study, as per rule of thumb derived from the simulation study for logistic regression at least 10 events per variable (EPV) for the minimum outcome.[2] The author calculated on the basis of 10 subjects per variable, which is not correct for logistic regression. For instance, 11 subjects out of 31 were survived on the end of the study. Thus, minimum outcome event is survival at end of the study. Therefore, survival benefit outcome had only 5.5 EPV as author included two risk factors in the model. In addition to this condition when a risk factor is rare which means positivity of this risk factor is small, even 10 EPV may be inadequate.[3]

Insufficient sample size in regression models yield unstable risk estimates, wider confidence interval (CI), and can reflect inaccurate association as author found in their study for in vitro fertilizations (IVFs) variable higher odds ratio (OR), wider CI and P value closest to 0.05. Furthermore, the author reported log-odds and its 95% CI 2.42 (1.06, 121.13), instead of reporting 95% CI of log-odds, the author reported 95% CI of OR. The correct value of log-odds and its 95% CI was 2.42 (0.06, 4.797). The OR is equal to exp(log-odds). In logistic regression, log-odds and OR has different meaning former give additive effect while later provide the multiplicative effect. The wider CI of OR usually indication of over-fitting of model and results of the model is not trustworthy. The H-L goodness of fit test is also sensitive to small sample size because this is based on the Chi-square test and usually divided into deciles (10 parts). Chi-square test needed more the 5 excepted frequencies in at least 80% of cells. Thus, this test is not reliable in a small size.

  Issues Related to Multiple Linear Regression Analysis Top

For multiple linear regression rules of thumb state that at least 20 subjects per eligible variable were included in the model. Whereas, the author applied multiple linear regression only in seven subjects which are too small to yield correct model results. The terminology in multiple regression is "regression coefficient" not "regression correlation" as reported by the author in results. Author reported seven subjects were included and reported F-statistics as F(2,6) = 6.27 which is also wrong because it should be F (p, n-p-1) where P is number of parameters and n is number of sample size, according to this it should be F(2,4). The basic assumptions for multiple linear regression: Normality of residuals and homogeneity of variance were not reported by the author.

Author reported that negative Pearson correlation between volume of IVF used and change in serum creatinine (r = −0.816) but in multiple regression its regression coefficients was positive and too small 0.000 although significant, zero regression coefficient appeared because volume of IVF was measured in milliliter, and the effect of 1 ml volume of IVFs change had very small influence on change in serum creatinine level. In such situations, it would be better to convert the ml into liter to get the regression coefficient into a meaningful and in a presented way.

Furthermore, Pearson correlation needs both variables should be metric scale, the author reported Pearson correlation (r = 0.042) between sex and change in MAP, which also seems to be inaccurate. In addition, the ratio between male to female is 30–1 that showed only 1 female in this study.

The multivariable regressions are an important tool in medical literature to find the association between outcome and risk factors, or to predict the outcome from a set of predictors. Reliability of regression model depends on the fulfillment of model associated assumptions. The quality of logistic regression was evaluated on bases of well-established 10 points criteria and found the quality in Indian medical journal is far lagging behind as compared to quality of logistic regression in articles published from Europe and USA.[4] The author should consult a competent statistician or read a good literature to understand the associated assumptions and criteria before analysis and reporting the results of these models.[5] The associated assumptions not only should be checked but also reported in the text. I appreciated that author tried to explain collinearity criterion of the multivariable regression model.

Financial Support and Sponsorship


Conflicts of Interest

There are no conflicts of interest.

  References Top

Hiremath SB, Kamdod MA. Effect of various drugs used in conservative therapy of hepatorenal syndrome: A retrospective drug utilization study. Indian J Pharmacol 2014;46:538-42.  Back to cited text no. 1
[PUBMED]  Medknow Journal  
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373-9.  Back to cited text no. 2
Katz MH. Multivariable analysis: A primer for readers of medical research. Ann Intern Med 2003;138:644-50.  Back to cited text no. 3
Kumar R, Indrayan A, Chhabra P. Reporting quality of multivariable logistic regression in selected Indian medical journals. J Postgrad Med 2012;58:123-6.  Back to cited text no. 4
[PUBMED]  Medknow Journal  
Kumar R, Chhabra P. Cautions required during planning, analysis and reporting of multivariable logistic regression. Curr Res Pract 2014;4:31-9.  Back to cited text no. 5

This article has been cited by
1 Statistical methods for in silico tools used for risk assessment and toxicology
Nermin A. Osman
Physical Sciences Reviews. 2022; 0(0)
[Pubmed] | [DOI]


Print this article  Email this article


Site Map | Home | Contact Us | Feedback | Copyright and Disclaimer | Privacy Notice
Online since 20th July '04
Published by Wolters Kluwer - Medknow