Puppet Enterprise is a great platform for automating the configuration and deployment of applications to servers, but as a sophisticated infrastructure management tool with numerous interconnected moving parts-- can be a challenge to troubleshoot when things go awry. This is especially true when dealing with cascading errors that are hard to isolate for resolution. What follows is a short list of some of the more common issues one may encounter, and a few tips on how to troubleshoot and resolve them.
1. Problem: You cannot log in as Admin to the Console
Locating the appropriate log files for analysis is the first step to determining the root cause:
Using 3rd-party Authentication (Active Directory, LDAP, et al.):
Using Console Authentication Service (Puppet’s built-in authentication system):
In either case, the respective log files will contain information about why authentication failed. For this discussion, we’ll assume the native Console Authentication Service is being used over 3rd-party authentication. Analysis of the /var/log/pe-consol-auth/cas.log file should reveal something like “Invalid credentials given for user 'firstname.lastname@example.org',” where ‘email@example.com’ is the user in question.
Admins may also find themselves locked out after 10 unsuccessful login attempts. In this situation, another admin must log in to manually clear the lock on the other account. Problem solved--assuming another admin account exists. But if not, then what? Running the following will create a new admin account:
$ cd /opt/puppet/share/console-auth$ sudo /opt/puppet/bin/rake db:create_user USERNAME=”firstname.lastname@example.org” PASSWORD=”<password>” ROLE=”Admin”
The new admin account can then be used to log in and unblock the other admin account.
Alternatively, one can unblock a user by directly removing the block flag from their database record. Puppet has created a tutorial on how to do this, though it’s specific to PE-installed Postgres databases.
2. Problem: Nodes are not appearing in the Console dashboard
Again, be sure to check the appropriate logs as an initial troubleshooting step:
The Background Tasks pane can provide additional information about why the nodes are not appearing. A large number of background tasks may indicate a malfunctioning dashboard worker.
In this case, stopping and starting the pe-puppet-dashboard-workers may fix the issue. Be sure to check the Background Tasks pane again after restarting the dashboard workers to verify that the number of tasks has gone down.
3. Problem: No facts are listed for nodes in the Console dashboard, or the node manager displays a blank page or error message
More often, this problem occurs when there are issues with Puppet’s internal dashboard certificate. This can be verified by analyzing the appropriate log file:
If any certificate verification or SSL errors exist, regenerating the internal dashboard certificate should resolve the issue.
4. Problem: The PuppetDB won’t start or fails silently
A myriad of causes exist for a non-starting or failing PuppetDB, with the most common being running out of memory. To verify that this is the case, check the appropriate log file:
If you see an error like “java.lang.OutOfMemoryError: Java heap space,” bumping up Puppet’s memory to a higher limit should resolve the issue. This can be done by editing either the /etc/sysconfig/pe-puppetdb or /etc/default/pe-puppetdb configuration file, depending on your OS.
5. Problem: No nodes are showing up in Live Management
When using Live Management to browse resources on nodes and invoke orchestration actions, sometimes a blank pane appears with no nodes present. Analysis of the /var/log/pe-httpd/error.log file should reveal a line that reads “No MCollective servers responded.” Restarting MCollective on the master and/or agents should effectively fix the problem.
A final note on logs: they are your friends, and should be looked at first when encountering problems. While this may seem like a no-brainer, the reality is that log files are often overlooked when obvious solutions come to mind. To make things easier, Puppet Labs created this cheat sheet for finding out where the various logs are located and what the errors/warnings could potentially mean. That said, log analysis should be the first in a series of steps towards resolving issues encountered with Puppet.
Though these 5 commonly encountered Puppet Enterprise problems can have various causes, in many cases the issues (e.g., non-reporting nodes or a failing database) are the result of misconfigurations prior to automation. It’s critical that the infrastructure and application configurations be visible and understood first, with the goal of bringing the environment under control prior to automation. GuardRail simplifies the task discovering automation requirements, and can even turn those requirements into pre-formatted Puppet manifests. By validating configurations both during post-build and on an ongoing basis, you can rest assured that the state of your systems are under control and safe from configuration drift.