Things to check in EMR Instance state logs
EMR logs Instance state logs every 15 mins which helps to identify several important metrics related to your EMR nodes like Memory / Disk etc. In this article, I will show you how to recognize important metrics and how we can interpret them when diagnosing an issue.
Finding Instance State logs of EMR cluster :
On EMR nodes, go to /emr/instance-state/
On S3, You will find in path like
s3://emr-log-bucket/j-QHD70YCKZWTG/node/i-0d861d80c83e33ec0/daemons/instance-state/
Things to check :
CPU load Avg :
If Load Average is higher than CPU count of that Instance type, there could be communication issues b/w daemons and all sort of issues with HDFS and shuffles in jobs.
1 2 3 |
# how long have we been up uptime 17:32:57 up 35 min, 0 users, load average: 1.10, 1.00, 0.46 |
CPU load average is the average number of processes being or waiting executed over past 1, 5 and 15 minutes. So the number shown above means:
- load average over the last 1 minute is 1.10
- load average over the last 5 minute is 1.00
- load average over the last 15 minute is 0.46
Lets say my Ec2 instance type is m5.large which has 2 vCPU’s according to
https://aws.amazon.com/ec2/instance-types/m5/
If my load avg. is larger than 2, then I should be concerned..
TOP :
check if there’s any processes occupying a lot of CPU and memory.
process list(PS).
Search for running processes like ‘HRegionServer’ to verify if a process is running. See previous instance state log if there’s a PID (process id) change for that process. If there is a PID change, most probably the process got killed with OOM between this time.
VMSTAT R B ,
B = blocked process – shouldn’t be blocked.
DMESG
to see OS issues like if OS is out of memory you will see OS randomly killing important processes.
“Free –m”
to check free memory. Do not overly rely on this as we only record free –m every 15 mins and its not a true representation of memory during the entire time.
“Df –h”
to check disk space.
Tags: df, dmesg, ec2, emr, instance state logs, load average, logs, ps, top
Trackback from your site.