rstatnetergm

R breacks when usinig ergm


I have a big network and always when I run the ergm Model, R breaks down. (In the mean while It shows high memory use).

Anyone an idea what I could do?

library(network)
library(ergm)
gc()

model.1a <- ergm(net[[1]] ~ edges() + 
    nodecov("dist2coast") + nodecov("dist2rail60") + 
    nodecov("dist2paved") + edgecov(dist_matrix),
  control = control.ergm(
    seed        = 1,
    MCMLE.maxit = 10,
    parallel    = 4,
    CD.maxit    = 10
  )
) 

net[[1]]
 Network attributes:
  vertices = 7819 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 273 
    missing edges= 0 
    non-missing edges= 273 

 Vertex attribute names: 
    agglosID agglosName builtUp capital class1 class2 class3 dist2capital dist2coast dist2emst dist2first dist2impr dist2paved dist2placebo16 dist2placebo22 dist2rail60 dist2rail60mil dist2rail60min dist2river dist2second first geometry ISO3 L1 Latitude Longitude mean2010 Metropole nodeID.1 notown Pop1950 Pop1960 Pop1970 Pop1980 Pop1990 Pop2000 Pop2010 Pop2015 prec_mean second sparseveg undetermined vertex.names Voronoi water 

No edge attributes

Edit: Ok I tried quite a bit. While the following model is working:

model.3a <- ergm(net[[1]] ~ edges+nodecov("dist2coast")+nodecov("dist2paved")+nodematch("G1SHORTNAM")+
                   nodematch("Colonization"), verbose = TRUE,
                 control = control.ergm(seed = 1, parallel= 6)) 

The follwing model will fail (its the same but added one Covariable. Thereby I tested different Covariable. Its always the same. I tested those Covariableas well as single combinations. I have the feeling that I encountaring the Problem, when running the model with more than 4 Covariable(I tested it on a Computer with 16GB RAM). Is it a RAM Problem?

model.4a <- ergm(net[[1]] ~ edges+nodecov("dist2coast")+nodecov("dist2paved")+nodematch("G1SHORTNAM")+
                   nodematch("Colonization")+ nodecov("alt_mean"), verbose = TRUE,
                 control = control.ergm(seed = 1, parallel= 6))     

Output for Model 3a:

Evaluating network in model.
Initializing unconstrained Metropolis-Hastings proposal: ‘ergm:MH_TNT’.
Initializing model...
Model initialized.
Using initial method 'MPLE'.
Fitting initial model.
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
MPLE covariate matrix has 11982405 rows.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 

Running model.4a shows "R encountered a fatal error. The session was terminated" at this line MPLE covariate matrix has 11982405 rows.. So just run it with more RAM?


Solution

  • Couple of issues here:

    1. In the model formula write edges without parentheses.
    2. Your network is extremely sparse -- 273 edges among 7819 nodes. That's network density in the order of 10^(-4). Are you sure it is correct?
    3. We have worked with networks with 100k nodes, but you need RAM for that. Does your computer have enough RAM?
    4. You are fitting a dyad-independent model. Such model simplifies to a logit GLM, but fitted to quite a big model matrix, especially that you have a edgecov term with a matrix that is 7819 x 7819. Still, MCMC is not used thus (1) you dont need MCMC and CD control parameters and (2) the "breaking of R" you are experiencing is not due to Markov chains going astray.
    5. Try re-running the estimation with verbose=TRUE to ergm() and update your question with the messages it produces. Otherwise it is hard to deduce what's going on with the estimation.