Posts Tagged ‘driver’

Increase Spark Driver memory using PySpark session from EMR Notebooks

Written by mannem on . Posted in AWS BIG DATA, EMR || Elastic Map Reduce

 

 

Spark UI vs. Spark History Server UI , which one to use and why ?

Written by mannem on . Posted in AWS BIG DATA, EMR || Elastic Map Reduce

Is Job Running ?

1. If you have Spark Applications Running, then you should be using SPARK UI. This UI is usually hosted on Spark Driver
– In YARN cluster mode, the Driver is run on YARN Application Master run on random Core node )
– IN YARN Client  Mode, the Driver is run on Master node itself.
To access Spark UI, You should be going to  YARN ResourceManager UI First. Then navigate to corresponding Spark Application and use “Application Master” link to Access Spark UI. If you observe the link, its taking you  you to the application master’s web UI at port 20888. This is basically a proxy running on master  listening on 20888  which makes available the Spark UI(which runs on either Core node or Master node)

2. You can also access Spark UI by going directly to Driver Hostname and Portname where its hosted.
For example, when I run spark-submit in cluster mode, it spinned up application_1569345960040_0007. In my driver logs I see below messages
19/09/24 22:29:15 INFO Utils: Successfully started service ‘SparkUI’ on port 35395.
19/09/24 22:29:15 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://ip-10-0-0-69.myermdomain.com:35395
Where ip-10-0-0-69.myermdomain.com is one of my core node.
So I can go to
http://ip-10-0-0-69.myermdomain.com:35395
This automatically routes me to Master node proxy server listening on port 20888
 http://ip-10-0-0-113.ec2.internal:20888/proxy/application_1569345960040_0007/
Please note that, these links are temporary and will only show the UI while the Spark Application is running.

Is Job Completed ?

But if you want to see UI even when Spark job is completed, you should use Spark HistoryServer UI directly at http://master-public-dns-name:18080/.
Spark History Server can also be used for Running Jobs using “Show Incomplete Applications” Button. Spark History Server does this by using Spark Event logs which is enabled on EMR by default.

Differences between Spark UI and Spark History UI

 But looks like Spark History Server has some  differences when compared to “Spark UI” (For Running Apps of course ). Some of em’ that I observed are :
– Spark UI has “Kill” Button so your can kill some Spark Stages while Spark History Server doesn’t.
– SPark UI has “SQL” tab which shows more information about spark-sql jobs while Spark History Server doesn’t.
– Spark UI can pull up live  Thread Dumps for Executors  while Spark History Server doesn’t.
– Spark UI can give most update to date info(like “Total Uptime”) on Tasks while there can be a bit lag in  Spark History Server UI.