Dealing with Negative (Red) Edges in Psychological Networks: Frequentist Edition


The topic of this blog post is negative edges in psychological networks. Negative edges are quite the mystery in substantive applications. This is because there is a long tradition in psychology of expecting only positive relations. This is termed a “positive manifold” (e.g., Horn and Cattell 1966). This is also an important aspect of network theory, for example, “[symptoms] not function as protective factors in the development of other symptoms”(p. 6, Borsboom et al. 2011). Hence, all of the edges in a given network are often expected to be positive.

That said, I never quite understood the problem of seeing negative (red) edges in a network plot. My initial apathy was because, even if all true non-zero relations are positive, we would actually expect to see some red edges in a network plot. But from now working with substantive researchers, it is clear that I unappreciated how concerning red edges are: is the method working incorrectly ? has some mistake been made ? are the effects flipping direction ?

Although these are important concerns, I would typically be more concerned if there were not any red edges:

This is paradoxical and related to natural sampling variability. This variability is required to compute valid1 confidence intervals and \(p-\)values. The means methods (e.g., (g)lasso) that reduce variance in the sampling distribution cannot provide actual confidence intervals.

In other words, to make inference about an effect we need the sampling variability. In turn, this requires (1) embracing red edges; or (2) incorporating the expectation of a positive manifold into the network.

To clarify these points, I decided to communicate the idea of natural/expected sampling variability and how this can be harnessed to improve statistical inference in partial correlation networks.

This blog is organized as follows:

  • I first show that red edges are expected, even when all relations are actually positive (or zero), and that it is possible to compute how many red edges there will be and their minimum edge weight.

  • Next, a non-regularized solution is provided that allows for incorporating theory into network analysis.

  • A simulation is then provided that demonstrates a key advantage of using non-regularized methods: formal error control.

A frequentist approach

It is possible to estimate the network in a really straightforward fashion. Indeed, once the partial correlations are obtained, knowing n and p is all that is required to determine the non-zero effects. I do not present equations (too lazy! but see our papers) and instead opt for R-code in certain places:2

# these packages are needed to reproduce 

# data
Y <- BGGM::ptsd

# number of nodes
p <- ncol(Y)

# adjacency matrix
adj <- matrix(0, p, p)

# number of nodes conditioned on
c <- p - 2

# number of variables
n <- nrow(Y)

# alpha
alpha = 0.05

In the above, we have the data (PTSD symptoms, Armour et al. 2017), n, p, c (the number of variables conditioned on), and a defined \(\alpha\) (type I error) level. Note that is the same exact procedure for determining the statistical significance of bivariate correlations. The only difference is the addition of c, as no variables are conditioned on for bivariate as opposed to partial correlations. Next we simply compute the partial correlations, compute the test-statistics, and then determine which \(p-\)values are less than alpha (\(\alpha\)):

# covariance matrix
covariance_matrix <- cov(Y)

# inverse of the covariance matrix
precision_matrix <- solve(covariance_matrix)

# partial correlation matrix
pcor_matrix <- -(cov2cor(precision_matrix) - diag(p))

# Fisher z transformation
z <- GGMnonreg::fisher_r2z(pcor_matrix[upper.tri(pcor_matrix)])

# test statistic
z_stat <- abs(z) / (1 / sqrt(n - c - 3))

# compute two-sided p-value
pvalues <- pnorm(z_stat, lower.tail = FALSE) * 2

# significant effects (the edges)
adj[upper.tri(adj)] <- ifelse(pvalues < alpha, 1, 0)

# weighted adjacency matrix
pcor_adj <- BGGM:::symmteric_mat(adj * pcor_matrix)

This is essentially information that is taught in introductory methods: we simply computed the significance of a correlation, but with the addition of c in the denominator of (1 / sqrt(n - c - 3)) (the standard error). For controlling false positives or discoveries, this is the gold standard for situations common to psychology (\(p < n\))!

It is then customary to plot the results:


This immediately shows the issue of negative (red) edges-they jump right off the graph, considering they are quite unexpected. How can this be, given the symptoms belong to a scale that was constructed to have positive associations (the bivariate correlations are all positive) ?

Is there something wrong with the gold standard approach?

The answer is no! Three red edges are expected.

Expected red edges

Buckle up as we journey into the trenches of frequentist logic. To figure out if this number of red edges is cause for concern, we only need to make an assumption about the overall sparsity. Hence, with

  • The sample size n
  • The number of nodes p
  • A defined false positive rate \(\alpha\)
  • Assumed sparsity \(\pi\),

we can actually figure out (exactly) how many red edges there will be. This is because, by assuming a level of sparsity, we know (on average) how many false positive there will be (and half of these will be negative).

To see this, assume that \(p = 20\) which results in \(\frac{1}{2}p(p-1) = 190\) effects in total. With \(\pi = 0.50\), this means that there are \(190 \cdot 0.5 = 95\) true zeros in the network (50% sparsity). And because there is a defined false positive rate, \(\alpha\), we then know how many false positives there are in a network, that is \(95 \cdot 0.05 = 4.75\). Obviously, there cannot be 4.75 false positives in a given network, which gets back to a key idea of frequentism. Namely, if we repeatedly sampled a population, then, on average, there will indeed be 4.75 false positives. Because there is no reason to assume that false positives are in a certain direction, \(\frac{1}{2}4.75 = 2.375\) is the number of expected negative edges. This corresponds closely to the plot above where there are 3 red edges and indicates that the method is not deficient.

I made the following plot to show how many false positives are expected for different network sizes and assumed sparsity levels:

# number of variables
p <- c(10, 25, 50)
# sparsity
sparsity <-  seq(0.1, 0.9, 0.1)
# number of edges
partials <- 0.5*p*(p-1)
# alpha level
alpha <- 0.05

# simple multiplication
res <- data.frame(FP = c(sparsity * partials[1]  * alpha, 
                         sparsity * partials[2]  * alpha,
                         sparsity * partials[3]  * alpha))

The take home from this plot (of the object res) is that the number of expected red edges is a function of the number of true zeroes. Hence, when the network is large, we expect many red edges: with 50% sparsity and 50 nodes, we would expect to see 15 red edges in the plot (even though the true network has only positive relations). This becomes even larger as sparsity increases (more true zeros). In my experience, psychological networks are not all that sparse but we would still expect to see red edges. Note also there would be more (less) for a larger (smaller) \(\alpha\) level.

Before proceeding to show that we can also determine the size of the negative edges, it is important to consider the variability surrounding the long run average. Here I used simulation.

First, it is informative to note that the results are essentially the same as the Figure titled Calculator (compare the lines between plots). The ribbon captures the minimum and maximum number of false positives. For example, with 50% sparsity and 50 nodes, there could be 5 or 25 red edges in a given graph.

Expected red edge size

Another concern that has been raised about non-regularized method is the size of the red edges, as they can be quite large. This is because, when using NHST, the effects must be “significant” to be included in the graph. This also has a simple solution for computing the minimum size of a red edge.

To see this, consider that using \(\alpha = 0.05\) translates into the partial correlation needing to be 1.96 standard errors (\(SE\)) away from zero to be “significant”. This means that an effect must be larger than \(|\hat{z}_{ij}| > 1.96 \cdot SE\). Because the standard error is simply \(1 / \sqrt{n - c - 3}\), we know the minimum edge weight of the red edges.

Here is an example with \(p = 20\). I have also run a simulation, from which I collected all of the false positives that were detected. Here is the simple solution for the lower-bound of a negative edge.

# sample size
n <- seq(50, 1000, 50)

# controlled
c <- 20 - 2

# analytic
res1 <- data.frame(r =  BGGM::fisher_z2r((1 / sqrt(n - c - 3)) * 1.96), 
                   n  = n)

The take home from this plot is that red edges will be quite large with small sample sizes. Indeed, the absolute minimum for a red edge with \(n = 50\) is larger than 0.30! This is because the standard error is large, which, in turn, requires the false positive to be quite large to be “significant”. This quickly dissipates. Also, the simulated results (the points) match the solution based on a “calculator” (the red line corresponding to the object res1).

Dealing with red edges

In this section, I show how to completely solve the red edge conundrum by incorporating the theoretical exception of a positive manifold into the network. This is done with a one-sided hypothesis test for positive edges (note positive_manifold = TRUE):

# incorporate theory
fit <-  GGMnonreg::GGM_fisher_z(Y, 
                                alpha = 0.05, 
                                positive_manifold = TRUE)
# add column names
colnames(fit$pcor_selected) <- colnames(Y)
# plot

This is the same data that was used previously (there were three red edges). Now there are no red edges in the network: This is a direct result of incorporating the theoretical expectation of positive relations into the analysis. Hence, the pressing concern of negative edges has been completely solved using frequentist methods.

Seeing red edges was not indicative of a problem with the method. The underlying “issue” was not incorporating theory into the analysis.

What about glasso ?

Computing \(p-\)values and confidence intervals requires sampling variance, whereas (g)lasso introduces bias and reduces variance. Hence, even the bootstrap will not work for constructing confidence intervals. This is well-known in the statistical literature (pp. 7 - 8, Bühlmann, Kalisch, and Meier 2014):

The (limiting) distribution of such a sparse estimator is non-Gaussian with point mass at zero, and this is the reason why standard bootstrap or sub-sampling techniques do not provide valid confidence regions or \(p-\)values.

Thus “confidence” intervals computed from (g)lasso are not confidence intervals (i.e., they do not cover the true value, say, for a 95 % interval, 95 % of the time).

On the other hand, the variance reduction and bias towards zero is what makes the graphs seem more “interpretable” when using glasso. This is because the false positives have a narrow sampling distribution around zero. This translates into red edges being quite small and not all that apparent in a plot, for example:

fit <- qgraph::EBICglasso(cor(Y), n = 221)

Here the red edges are less (far) less pronounced than the non-regularized method above. However, the graph still raises the concern of red edges because they are there. A naive perspective would be to think (g)lasso is a magical method and by overcoming its supernatural power this suggests there really is a negative effect. This reasoning is misguided (to say the very least). But I digress.

The important thing is that the very reason glasso has less pronounced negative edges (reducing variance, introducing bias, and inducing sparsity) is the very reason bootstrapping does not work correctly-a methodological quagmire indeed.

Error control

In this section, I demonstrate that is possible to control the error rate with non-regularized methods. No such thing is possible with (g)lasso. To understand this, it is important to know that commonly used measures to assess performance in simulation can be directly controlled by defining an \(\alpha\) level. For example,

  • specificity is the same as \(1 - \alpha\).
  • precision is the same as $1 - $the false discovery rate (FDR).

This indicates that we can estimate a network that controls either. In this example, I focus on specificity (an additional blog will show how to control precision). I compare to the popular glasso method (using the default settings in the R package qgraph, Epskamp et al. 2012).

Specificity control

Specificity can be controlled by setting it to \(1 - \alpha\). Hence, for \(\alpha = 0.05\) this corresponds to specificity of 0.95, etc.

I simulated networks that included 20 nodes, 50% sparsity, and the partial correlations were mostly between -0.4 and 0.4 (see Tables 1 and 2 in Wysocki and Rhemtulla 2019). Further, I looked at four \(\alpha\) levels, 0.5, 0.25, 0.10, and 0.01, that correspond to specificity of 0.5, 0.75, 0.90, and 0.99. The scores were averaged across 250 simulated datasets.

I have highlighted glasso with the black line. A common misconception in the network literature is that (g)lasso reduces the chances of making a false positive and that it converges on the true model. Both are incorrect. Statisticians have even stated that lasso is not ideal for model selection (p. 278, Tibshirani 2011):

The lasso is doing variable screening and, hence, I suggest that we interpret the second ‘s’ in lasso as ‘screening’ rather than ‘selection’…Once we have the screening property, the task is to remove the false positive selections…

This was stated by Dr. Peter Bühlmann in the discussion of the paper, “Regression shrinkage and selection via the lasso: A retrospective” (link to paper). This sentiment can be seen in the plot, as specificity actually decreases with larger \(n\). Said another way, the false positive rate actually increases with more data!

Now focus on the NHST methods. Here we see perfectly horizontal lines at the desired \(1 - \alpha\) level. This is because when using NHST we have formal error control. Hence, we can calibrate the network to the desired level of specificity. Note that sensitivity is essentially “power.” The light blue line makes clear that power is about the same as glasso but specificity is much higher. Furthermore, if the goal is in fact to limit false positives, this can be achieved by setting \(\alpha = 0.01\) (i.e., specificity of 99 %).

Side note:

I find the term “specificity” somewhat confusing. When talking about detecting effects (as opposed to diagnosing patients) specificity is the same as \(1 - \alpha\). When specificity is 75% this might sound rather impressive. However, this corresponds to a false positive rate of 25%! Clearly much higher than the conventional level of 5%.


I hope this post provided some insights into thinking about sampling variability. Because networks do have (many) more effects than other models, considering sampling variability is that much more important:

We should not run from sampling variability (e.g., by using methods that reduce it). Rather, we should embrace variability with arms wide open.

I am not the first to suggest that we should think more about sampling variability in networks. Recently, this same logic has been applied to network replication (Jones, Williams, and McNally 2019; Williams 2020). Hence, thinking about sampling variability is useful for replicability and wrestling with negative edges.

As I demonstrated, there is a simple approach for directly incorporating the theoretical expectation of a positive manifold into network analysis. This avoids perhaps searching for a method that “works correctly”, and, in doing so, it is possible to control either specificity or precision.

Some thoughts

We have written many papers arguing against the wide spread use of (g)lasso. Yet, regularization is still more popular than ever, even though we have demonstrated that (1) it does not have a low false positive rate (see the plots in this blog as well, Williams et al. 2019; Williams and Rast 2019); and (2) it is not needed to mitigate overfitting (Williams and Rodriguez 2020).

I now think (without evidence) that (g)lasso is not used because of (1) or (2) but perhaps to make the red edges less apparent. However, this is not unnecessary if the goal is to have a model that reflects a positive manifold. Theory should simply be included in the analysis.

What about ordinal data?

I focused on continuous data because there are simple equations to work with.3 The exact idea would also apply to ordinal data with polychoric partial correlations. However, the standard error is sometimes much larger for polychoric partial correlations. As a result, the (false positive) red edges will be larger as well. But this is easily addressed with the approach presented in this blog (see the GGMnonreg R package).

Bayesian edition

In a follow-up blog, I will describe a Bayesian approach for dealing with negative edges.

Take home

  • (g)lasso is not needed to ensure a plot looks palatable.

  • NHST allows for directly incorporating scientific theory into network analysis. This completely solves the negative edge conundrum.

  • Red edges are expected\(-\)due to natural sampling variability\(-\)when theory is absent from the model. A case of “theoretical amnesia” (link):

…at least try to remember what theory is and what it’s good for [emphasis added], so that we don’t fall into theoretical amnesia (Borsboom, 2013).

  • NHST allows for formal error control (e.g., with actual confidence intervals), which is not possible with (g)lasso.4


Armour, Cherie, Eiko I. Fried, Marie K. Deserno, Jack Tsai, and Robert H. Pietrzak. 2017. “A network analysis of DSM-5 posttraumatic stress disorder symptoms and correlates in U.S. military veterans.” Journal of Anxiety Disorders 45 (May 2013): 49–59.

Borsboom, Denny, Angélique O. J. Cramer, Verena D. Schmittmann, Sacha Epskamp, and Lourens J. Waldorp. 2011. “The Small World of Psychopathology.” Edited by Rochelle E. Tractenberg. PLoS ONE 6 (11): e27407.

Bühlmann, Peter, Markus Kalisch, and Lukas Meier. 2014. “High-Dimensional Statistics with a View Toward Applications in Biology.” Annual Review of Statistics and Its Application 1 (1): 255–78.

Epskamp, Sacha, Angélique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann, and Denny Borsboom. 2012. “qgraph: Network Visualizations of Relationships in Psychometric Data.” Journal of Statistical Software 48 (4).

Horn, John L, and Raymond B Cattell. 1966. “Refinement and test of the theory of fluid and crystallized general intelligences.” Journal of Educational Psychology 57 (5): 253.

Jones, Payton J., Donald R. Williams, and Richard J. McNally. 2019. “Sampling Variability is not Nonreplication: A Bayesian Reanalysis of Forbes, Wright, Markon, & Krueger.” PsyArXiv.

Tibshirani, Robert. 2011. “Regression shrinkage and selection via the lasso: A retrospective.” Journal of the Royal Statistical Society. Series B: Statistical Methodology 73 (3): 273–82.

Williams, Donald R. 2020. “Learning to Live with Sampling Variability: Expected Replicability in Partial Correlation Networks.” PsyArXiv.

Williams, Donald R., and Philippe Rast. 2019. “Back to the basics: Rethinking partial correlation network methodology.” British Journal of Mathematical and Statistical Psychology.

Williams, Donald R., Mijke Rhemtulla, Anna C. Wysocki, and Philippe Rast. 2019. “On Nonregularized Estimation of Psychological Networks.” Multivariate Behavioral Research 54 (5): 1–23.

Williams, Donald R., and Josue E. Rodriguez. 2020. “Why Overfitting is Not (Usually) a Problem in Partial Correlation Networks.” PsyArXiv.

Wysocki, Anna C., and Mijke Rhemtulla. 2019. “On Penalty Parameter Selection for Estimating Network Models.” Multivariate Behavioral Research, 1–15.

Zhang, Rong, Zhao Ren, and Wei Chen. 2018. “SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks.” Edited by Manja Marz. PLoS Computational Biology 14 (August): e1006369.

  1. Valid refers to the definition of a confidence interval and \(p\)-value. See Wikipedia: ( (

  2. Most code has been omitted because it was making the blog hard to follow. The .Rmd file is provided at my github.

  3. There are also rather simple equations for Spearman and Kendall rank correlations. The standard errors are bit larger, which translates into larger red edges in the graph. This can also be remedied by including theory into the model.

  4. More recent lasso-based method can construct confidence intervals (but not with the bootstrap). This is accomplished by removing the bias and sparsity, which aims to have the proper variance to have the correct properties. These methods are not needed in psychology (e.g., Zhang, Ren, and Chen 2018), because we already have confidence intervals with the correct properties.

Donald R. Williams
Graduate Student in Quantitative Psychology
comments powered by Disqus