At Google Cloud Next two weeks ago I had the chance to chat with many folks using or experimenting with Google products. Questions about change management and audit logging came up frequently. Some folks wanted notifications for configuration changes. Others wanted a quick and easy way to see who is doing what for auditing purposes. Many attendees were not aware of Google Cloud’s existing audit logging capabilities or how to set alerts on specific audited events. If you need to do these types of tasks, this blog post should help you get started.
First: Alerting Philosophy
There are many approaches to alerting. Based on the years I spent carrying a pager I am opinionated about what types of alerts are useful and appropriate.
My cardinal rule is that alerts should be actionable. This is a common philosophy. Paul Newson (SRE Advocate) taught that the medium for the alert should match the urgency of the alert. Pages should be things that require immediate attention. Items that should be dealt with in the next 12 hours might go into a ticketing system. Alerts that aren’t urgent may go into email, where they may never get read.
I also want my monitoring system to have a good signal to noise ratio. Any alert that isn’t actionable or I can ignore because of some other circumstances increases noise. I know many people who have ended up putting a phone or pager in the fridge overnight because it would not stop repeating an erroneous page.
There’s a lot more that goes into an approach to monitoring. The ideas above are a few things to consider when setting up your system. Check out the SRE book’s chapter on monitoring distributed systems to learn more about how Google approaches monitoring.
Viewing Audit Logs
To view audit logs, start at the Cloud Console Homepage and click “Activity.” This brings you to the audit logging summary page. There are two kinds of audit logs: admin activity and data access. Admin activity includes resource creates, updates, and deletes. Admin activity also includes API calls made to the system by either the gcloud SDK or by using the web UI. The list of products that support admin activity logs is here.
Data access logs track accesses to user-provided data in databases and other data stores. Currently, the only product that supports data access logs is BigQuery. Data access logs are not visible in the default view of the audit log summary.
Admin activity logs are visible to all project members. Data access logs are only visible to users with the Private Logs Viewer IAM role.
The audit log summary allows you to see the action that generated the log and the user (identified by their email address) that did the action. You can view logs from multiple projects at once and restrict the view to specific types of resources, types of logs, and time ranges.
Viewing Audit Logs in Stackdriver Logging
The summary view has relatively limited amounts of data. If you need advanced filtering or want to see the entire log messages, you should use Stackdriver Logging. In Stackdriver Logging pick the type of resource you are interested in and then choose “activity” in the drop down that contains different log types.
If you expand the log messages, you can see the authentication information, the method called, an event message, and other information about the request. One important thing to notice is that every admin action has two associated logs. One logs the start of the action and another event logs the completion of the action. You can use the free text field at the top to search for specific logs, or you can click on a field like eventMessage or methodName and choose to either show or hide similar messages.
Alerting on Audit Logs
Setting up alerts on audit logs is the same as setting up other forms of log-based alerts. You need to start with Stackdriver Logging Log Viewer. Then you can build filters using the methods described above to narrow in on the specific messages you want alerts for.
Once you have narrowed your search, you create a metric using the “Create Metric” button. A metric is data based on a particular logging query that logging feeds into Stackdriver Monitoring.
Once you have created the metric go to Stackdriver Monitoring and click “Create Alerting Policy.” When you set up the condition for your alert pick “Log Metric” as the resource, and you will see the metric you previously created in the Logs Viewer. Then you can set the notification method based on the urgency. Webhook alerts are excellent for connecting Stackdriver Monitoring to your ticketing system. Informative alerts might be best if they show up in a team chat program like HipChat. I also like to include some documentation about what I am supposed to do when I receive this alert so that if a bunch of things are burning down, I have a quick reminder about where to start looking.