vagrantpuppetapt-get

How to only run 'apt-get update' if a package needs to be installed or updated


I can't seem to figure out how to get Puppet to not run 'apt-get update' during every run. #


The standard yet inefficient way: The way I've been doing this is with the main Puppet manifest having:

exec { 'apt-get update':
  path => '/usr/bin',
}

Then each subsequent module that needs a package installed has:

package { 'nginx':
  ensure => 'present',
  require => Exec['apt-get update'],
}

The problem with this is that, every time Puppet runs, Apt gets updated. This puts unnecessary load on our systems and network.


The solution I tried, but fails:

I looked in the Puppet docs and read about subscribe and refreshonly.

Refresh: exec resources can respond to refresh events (via notify, subscribe, or the ~> arrow). The refresh behavior of execs is non-standard, and can be affected by the refresh and refreshonly attributes:

  • If refreshonly is set to true, the exec will only run when it receives an event. This is the most reliable way to use refresh with execs.

subscribe

One or more resources that this resource depends on, expressed as resource references. Multiple resources can be specified as an array of references. When this attribute is present:

  • The subscribed resource(s) will be applied before this resource.

so I tried this in the main Puppet manifest:

# Setup this exec type to be used later.
# Only gets run when needed via "subscribe" calls when installing packages.
exec { 'apt-get update':
  path => '/usr/bin',
  refreshonly => true,
}

Then this in the module manifests:

# Ensure that Nginx is installed.
package { 'nginx':
  ensure => 'present',
  subscribe => Exec['apt-get update'],
}

But this fails because apt-get update doesn't get run before installing Nginx, so Apt can't find it.


Surely this is something others have encountered? What's the best way to solve this?


Solution

  • Puppet has a hard time coping with this scenario, because all resources are synchronized in a specific order. For each resource Puppet determines whether it needs a sync, and then acts accordingly, all in one step.

    What you would need is a way to implement this process:

    1. check if resource A (a package, say) needs a sync action (e.g., needs installing)
    2. if so, trigger an action on resource B first (the exec for apt-get update)
    3. once that is finished, perform the operation on resource A

    And while it would be most helpful if there was such a feature, there currently is not.

    It is usually the best approach to try and determine the necessity of apt-get update from changes to the configuration (new repositories added, new keys installed etc.). Changes to apt's configuration can then notify the apt-get update resource. All packages can safely require this resource.

    For the regular refreshing of the database, it is easier to rely on a daily cronjob or similar.