Compare GGMs with a posterior predicitve check (Gelman et al. 1996)
.
This method was introduced in Williams et al. (2020)
. Currently,
there is a global
(the entire GGM) and a nodewise
test. The default
is to compare GGMs with respect to the posterior predictive distribution of Kullback
Leibler divergence and the sum of squared errors. It is also possible to compare the
GGMs with a user defined test-statistic.
Usage
ggm_compare_ppc(
...,
test = "global",
iter = 5000,
FUN = NULL,
custom_obs = NULL,
loss = TRUE,
progress = TRUE
)
Arguments
- ...
At least two matrices (or data frames) of dimensions n (observations) by p (variables).
- test
Which test should be performed (defaults to
"global"
) ? The options includeglobal
andnodewise
.- iter
Number of replicated datasets used to construct the predictivie distribution (defaults to 5000).
- FUN
An optional function for comparing GGMs that returns a number. See Details.
- custom_obs
Number corresponding to the observed score for comparing the GGMs. This is required if a function is provided in
FUN
. See Details.- loss
Logical. If a function is provided, is the measure a "loss function" (i.e., a large score is bad thing). This determines how the p-value is computed. See Details.
- progress
Logical. Should a progress bar be included (defaults to
TRUE
) ?
Value
The returned object of class ggm_compare_ppc
contains a lot of information that
is used for printing and plotting the results. For users of BGGM, the following
are the useful objects:
test = "global"
ppp_jsd
posterior predictive p-values (JSD).ppp_sse
posterior predictive p-values (SSE).predictive_jsd
list containing the posterior predictive distributions (JSD).predictive_sse
list containing the posterior predictive distributions (SSE).obs_jsd
list containing the observed error (JSD).obs_sse
list containing the observed error (SSE).
test = "nodewise"
ppp_jsd
posterior predictive p-values (JSD).predictive_jsd
list containing the posterior predictive distributions (JSD).obs_jsd
list containing the observed error (JSD).
FUN = f()
ppp_custom
posterior predictive p-values (custom).predictive_custom
posterior predictive distributions (custom).obs_custom
observed error (custom).
Details
The FUN
argument allows for a user defined test-statisic (the measure used to compare the GGMs).
The function must include only two agruments, each of which corresponds to a dataset. For example,
f <- function(Yg1, Yg2)
, where each Y is dataset of dimensions n by p. The
groups are then compare within the function, returning a single number. An example is provided below.
Further, when using a custom function care must be taken when specifying the argument loss
.
We recommended to visualize the results with plot
to ensure the p-value was computed
in the right direction.
Note
Interpretation:
The primary test-statistic is symmetric KL-divergence that is termed Jensen-Shannon divergence (JSD). This is in essence a likelihood ratio that provides the "distance" between two multivariate normal distributions. The basic idea is to (1) compute the posterior predictive distribution, assuming group equality (the null model). This provides the error that we would expect to see under the null model; (2) compute JSD for the observed groups; and (3) compare the observed JSD to the posterior predictive distribution, from which a posterior predictive p-value is computed.
For the global
check, the sum of squared error is also provided.
This is computed from the partial correlation matrices and it is analagous
to the strength test in van Borkulo et al. (2017)
. The nodewise
test compares the posterior predictive distribution for each node. This is based on the correspondence
between the inverse covariance matrix and multiple regresssion (Kwan 2014; Stephens 1998)
.
If the null model is not
rejected, note that this does not
provide evidence for equality!
Further, if the null model is rejected, this means that the assumption of group equality is not tenable–the
groups are different.
Alternative Methods:
There are several methods in BGGM for comparing groups. See
ggm_compare_estimate
(posterior differences for the
partial correlations), ggm_compare_explore
(exploratory hypothesis testing),
and ggm_compare_confirm
(confirmatory hypothesis testing).
References
Gelman A, Meng X, Stern H (1996).
“Posterior predictive assessment of model fitness via realized discrepancies.”
Statistica sinica, 733–760.
Kwan CC (2014).
“A regression-based interpretation of the inverse of the sample covariance matrix.”
Spreadsheets in Education, 7(1), 4613.
Stephens G (1998).
“On the Inverse of the Covariance Matrix in Portfolio Analysis.”
The Journal of Finance, 53(5), 1821–1827.
van Borkulo CD, Boschloo L, Kossakowski J, Tio P, Schoevers RA, Borsboom D, Waldorp LJ (2017).
“Comparing network structures on three aspects: A permutation test.”
Manuscript submitted for publication.
Williams DR, Rast P, Pericchi LR, Mulder J (2020).
“Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.”
Psychological Methods.
doi:10.1037/met0000254
.
Examples
# \donttest{
# note: iter = 250 for demonstrative purposes
# data
Y <- bfi
#############################
######### global ############
#############################
# males
Ym <- subset(Y, gender == 1,
select = - c(gender, education))
# females
Yf <- subset(Y, gender == 2,
select = - c(gender, education))
global_test <- ggm_compare_ppc(Ym, Yf,
iter = 250)
#> BGGM: Predictive Check (Contrast 1)
#> BGGM: Finished
global_test
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 250
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ym, Yf, iter = 250)
#> ---
#> Symmetric KL divergence (JSD):
#>
#> contrast JSD.obs p_value
#> Yg1 vs Yg2 0.442 0
#> ---
#>
#> Sum of Squared Error:
#>
#> contrast SSE.obs p.value
#> Yg1 vs Yg2 0.759 0
#> ---
#> note:
#> JSD is Jensen-Shannon divergence
#############################
###### custom function ######
#############################
# example 1
# maximum difference van Borkulo et al. (2017)
f <- function(Yg1, Yg2){
# remove NA
x <- na.omit(Yg1)
y <- na.omit(Yg2)
# nodes
p <- ncol(Yg1)
# identity matrix
I_p <- diag(p)
# partial correlations
pcor_1 <- -(cov2cor(solve(cor(x))) - I_p)
pcor_2 <- -(cov2cor(solve(cor(y))) - I_p)
# max difference
max(abs((pcor_1[upper.tri(I_p)] - pcor_2[upper.tri(I_p)])))
}
# observed difference
obs <- f(Ym, Yf)
global_max <- ggm_compare_ppc(Ym, Yf,
iter = 250,
FUN = f,
custom_obs = obs,
progress = FALSE)
global_max
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 250
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ym, Yf, iter = 250, FUN = f, custom_obs = obs,
#> progress = FALSE)
#> ---
#> Custom:
#>
#> contrast custom.obs p.value
#> Yg1 vs Yg2 0.16 0.02
#> ---
# example 2
# Hamming distance (squared error for adjacency)
f <- function(Yg1, Yg2){
# remove NA
x <- na.omit(Yg1)
y <- na.omit(Yg2)
# nodes
p <- ncol(x)
# identity matrix
I_p <- diag(p)
fit1 <- estimate(x, analytic = TRUE)
fit2 <- estimate(y, analytic = TRUE)
sel1 <- select(fit1)
sel2 <- select(fit2)
sum((sel1$adj[upper.tri(I_p)] - sel2$adj[upper.tri(I_p)])^2)
}
# observed difference
obs <- f(Ym, Yf)
global_hd <- ggm_compare_ppc(Ym, Yf,
iter = 250,
FUN = f,
custom_obs = obs,
progress = FALSE)
global_hd
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 250
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ym, Yf, iter = 250, FUN = f, custom_obs = obs,
#> progress = FALSE)
#> ---
#> Custom:
#>
#> contrast custom.obs p.value
#> Yg1 vs Yg2 75 0.6
#> ---
#############################
######## nodewise ##########
#############################
nodewise <- ggm_compare_ppc(Ym, Yf, iter = 250,
test = "nodewise")
#> BGGM: Predictive Check (Contrast 1)
#> BGGM: Finished
nodewise
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Nodewise Predictive Check
#> Posterior Samples: 250
#> Group 1: 805
#> Group 2: 1631
#> Nodes: 25
#> Relations: 300
#> ---
#> Call:
#> ggm_compare_ppc(Ym, Yf, test = "nodewise", iter = 250)
#> ---
#> Symmetric KL divergence (JSD):
#>
#> Yg1 vs Yg2
#> Node JSD.obs p_value
#> 1 0.001 0.648
#> 2 0.000 0.952
#> 3 0.006 0.088
#> 4 0.005 0.252
#> 5 0.006 0.064
#> 6 0.002 0.472
#> 7 0.000 0.776
#> 8 0.000 0.856
#> 9 0.000 0.936
#> 10 0.003 0.216
#> 11 0.020 0.016
#> 12 0.009 0.020
#> 13 0.003 0.284
#> 14 0.004 0.112
#> 15 0.011 0.016
#> 16 0.001 0.464
#> 17 0.000 0.616
#> 18 0.000 0.636
#> 19 0.000 0.764
#> 20 0.000 0.708
#> 21 0.000 0.892
#> 22 0.003 0.484
#> 23 0.002 0.376
#> 24 0.013 0.176
#> 25 0.018 0.044
#> ---
#>
#> note:
#> JSD is Jensen-Shannon divergence
# }