lundi 5 avril 2021

How the likelihood is computed with a (log) link function

There is something I remained unable to understand in the way the likelihood is computed in a glm. Let's take a simple example (in R):

x=rnorm(30,100)
Y=c(rep("a",10),rep("b",10),rep("c",10))

The log-likelihood of the null model can be computed that way:

logLik(glm(x~1,family=gaussian))
'log Lik.' -38.3441 (df=2)

And this indeed corresponds to what I can compute "by hand":

mu=mean(x)
sigma=sqrt(var(x)*(length(x)-1)/length(x))
-log(sigma*sqrt(2*pi))*length(x)-1/(2*sigma*sigma)*sum((x-mu)*(x-mu))
[1] -38.3441

So far, so good.

The log-likelihood for the model with the factor is:

logLik(glm(x~Y,family=gaussian))
'log Lik.' -36.36673 (df=4)

Hence, the likelihood ratio test leads to a chi2-value of (with 2 df):

2*(-36.36673+38.3441)
[1] 3.95474

My first question is why this Chi2 value is not what is produced with the anova() function?:

anova(glm(x~Y,family=gaussian), glm(x~1,family=gaussian),test="Chisq")
Analysis of Deviance Table
Model 1: x ~ Y
Model 2: x ~ 1
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        27     19.841 
2        29     22.637 -2  -2.7958   0.1492

I remained unable to understand this. The other problem that I have with this is the following one. If I use a log link function instead, still with a Gaussian model, I get this:

logLik(glm(x~1,family=gaussian(link="log")))
'log Lik.' -38.3441 (df=2)
logLik(glm(x~Y,family=gaussian(link="log")))
'log Lik.' -36.36673 (df=4)

In other words, I get exactly the same log-likelihoods with a log link function or with an identity link function. How can this be possible? The ability of the model to describe the data is not the same if the expected value (from the model) is log-transformed. I guess I am missing something important here. Any help on this will be welcomed. Thanks & cheers, Eric

Aucun commentaire:

Enregistrer un commentaire