![]() That's because in logistic regression there is no assumption about the distribution of $X$, but LDA does assume $X$ is normally distributed and the t-test likewise assumes the residuals are. A stronger analogy would be Fisher's linear discriminant analysis. T(table(y,x)) # these data are the same as 0 1Ġ 1 3 # using my conventions, exp(b0) would be 3/1 = 3ġ 2 4 # using my conventions, exp(b0) would be 2/1 = 2įit.YonX = glm(y~x, family=binomial(link="logit"))įit.XonY = glm(x~y, family=binomial(link="logit"))ġ Strictly speaking, running a t-test 'in the other direction' wouldn't quite be a logistic regression. To be sure, it is possible to treat $Y$ as a predictor of $X$ instead of treating $X$ as a predictor of $Y$ by using the following equations: The regression coefficient is often positive, indicating that blood pressure increases with age.Correlation and linear regression are sometimes distinguished in statistics books by saying that the former is symmetric and the latter is asymmetric in the following sense: in the case of correlation, no distinction is made between dependent and independent variables, whereas it makes a difference which variables are treated as dependent and independent variables in a regression equation. Consider a regression of blood pressure against age in middle aged men. Computer packages will often produce the intercept from a regression equation, with no warning that it may be totally meaningless. For instance, a regression line might be drawn relating the chronological age of some children to their bone age, and it might be a straight line between, say, the ages of 5 and 10 years, but to project it up to the age of 30 would clearly lead to error. To project the line at either end – to extrapolate – is always risky because the relationship between x and y may change or some kind of cut off point may exist. They show how one variable changes on average with another, and they can be used to find out what one variable is likely to be when we know the other – provided that we ask this question within the limits of the scatter diagram. Regression lines give us useful information about the data they are collected from. Calculation of the correlation coefficient However, it is hardly likely that eating ice cream protects from heart disease! It is simply that the mortality rate from heart disease is inversely related – and ice cream consumption positively related – to a third factor, namely environmental temperature. As a further example, a plot of monthly deaths from heart disease against monthly sales of ice cream would show a negative association. However, if the intention is to make inferences about one variable from the other, the observations from which the inferences are to be made are usually put on the baseline. In such cases it often does not matter which scale is put on which axis of the scatter diagram. The yield of the one does not seem to be “dependent” on the other in the sense that, on average, the height of a child depends on his age. It is reasonable, for instance, to think of the height of children as dependent on age rather than the converse but consider a positive correlation between mean tar yield and nicotine yield of certain brands of cigarette.’ The nicotine liberated is unlikely to have its origin in the tar: both vary in parallel with some other factor or factors in the composition of the cigarettes. This confusion is a triumph of common sense over misleading terminology, because often each variable is dependent on some third variable, which may or may not be mentioned. The words “independent” and “dependent” could puzzle the beginner because it is sometimes not clear what is dependent on what.
0 Comments
Leave a Reply. |