As the EMR/Hadoop cluster’s are transient, tracking all those databases and tables across clusters may be difficult. So, Instead of having different warehouse directories across clusters, You can use a single permanent hive warehouse across all EMR clusters.
S3 would be a great choice as it is persistent storage and had robust architecture providing redundancy and read-after-write consistency.
For each cluster:
This can be configured using
hive.metastore.warehouse.dir property on
<description>location of default database for the warehouse</description>
You may need to update this setting on all nodes.
On a single hive session:
this can be configured using a command like
set hive.metastore.warehouse.dir ="s3n://bucket/hive_warehouse"
or initialize hive cli with the following invocation
Note that using above configuration, all default databases and tables will be stored on s3 on path like s3://bucket/hive_warehouse/myHiveDatabase.db/
Trackback from your site.