After a buyer complained {that a} characteristic of marbot, our monitoring resolution for AWS was not working as anticipated, I began debugging the difficulty. First, I checked the CloudWatch alarms we use to watch all Lambda features. All CloudWatch alarms had been in standing OK,
and we additionally had not acquired any alerts by way of Slack. Subsequent, I analyzed the CloudWatch logs. To my shock, I came upon that certainly one of our Lambda features failed every now and then. I used to be shocked concerning the blind spot in our monitoring configuration.
Are you utilizing CloudWatch alarms for Lambda perform monitoring as effectively? Learn on to make sure you keep away from making the identical mistake we did.
Downside
For some purpose, the CloudWatch alarms we configured to get notified about failed executions of Lambda features didn’t work accurately. Right here is an excerpt from our CloudFormation code to configure CloudWatch alarms.
The ErrorsAlarm
displays the Error
metric of the LambdaFunction
. As quickly because the variety of errors throughout the previous 5 minutes exceeds 0, the alarm flips to state ALARM
.
LambdaFunction: |
Sounds tremendous. Right here is the catch.
“The timestamp on a metric displays when the perform was invoked. Relying on the length of the invocation, this may be a number of minutes earlier than the metric is emitted. For instance, in case your perform has a 10-minute timeout, then look greater than 10 minutes previously for correct metrics.” (see Working with Lambda perform metrics)
The next determine illustrates that when Lambda writes metric information, it makes use of the timestamp of the perform invocation (begin).
In our case, we set the timeout of the LambdaFunction
to a most of quarter-hour. However the CloudWatch alarm appears again solely 5 minutes. Because the invocation timestamp is used when inserting a metric level into the Errors
metric, the CloudWatch alarm misses errors from invocations longer than 5 minutes.
Resolution
To keep away from blind spots when monitoring Lambda features with CloudWatch alarms, stick with the next rule.
CloudWatch Analysis Interval > Lambda Perform Timeout |
Again to our case, we elevated the analysis interval of the ErrorsAlarm
to twenty minutes by rising the analysis intervals from 1 to 4.
LambdaFunction: |
So, verify the configuration of your CloudWatch alarms monitoring Lambda features!