EMRFS Role Mappings integration with LDAP JupyterHub EMR

Written by mannem on . Posted in AWS BIG DATA, EMR || Elastic Map Reduce

This Article Assumes you have explored following Articles

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-user-impersonation.html

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-ldap-users.html

This Article expects your have a working Windows AD server with LDAP enabled.

EMRFS Role Mapping Allows you to use a Different Roles than default EMR_EC2_DefaultRole which will be used to make calls to S3. Using security configurations, we define a mapping to map an User/Group/Prefix to a particular IAM Role. Example configuration,

user1_onGroup1 -> Group1

user1_onGroup1 -> Group2

 

Here, the User and Group can be LDAP User/ LDAP Group respectively.

This article guides you to integrate LDAP with JupyterHub on EMR. After this setup,  User’s on your LDAP server can login in EMR’s JupyterHub to submit YARN Jobs. We will also enable User Impersonation where YARN Jobs are run by your LDAP user and not default user like ‘yarn’. If  EMRFS Role Mapping is enabled, then an IAM Role corresponding to your LDAP User will be used to make calls to S3. This will also make sure that if an LDAP User belongs to an LDAP group, IAM Role Mapping corresponding to that Group will be used.

First Spin up EMR cluster using Following Configuration and also using above EMRFS Role mappings using security configurations.

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

 

  • livy.impersonation.enabled=true to enable Livy user impersonation.
  • hadoop.security.group.mapping = org.apache.hadoop.security.LdapGroupsMapping to make sure Hadoop connect directly to an LDAP server to resolve the list of groups instead of operating systems’ group name resolution. If we do not do this step, EMRFS Role Mapping will not work with LDAP Groups.
  • hadoop.security.group.mapping.ldap.bind.user , the user that will be used to make LDAP search to retreive Group information.
  • hadoop.security.group.mapping.ldap.bind.password, this user’s LDAP password
  • hadoop.security.group.mapping.ldap.url , hostname and port of your LDAP server
  • hadoop.security.group.mapping.ldap.base, configures the search base for the LDAP connection. This is a distinguished name, and will typically be the root of the LDAP directory.

See https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/GroupsMapping.html#LDAP_Groups_Mapping

Now Login to Master Node and run the following scripts

Now login to JupyterHub UI using LDAP user’s credentials. Once you submit a spark job, the job will make use of IAM Role Mapped to this LDAP user to make calls to S3.

 

Some other Considerations

– we can use “hadoop.user.group.static.mapping.overrides” to provide necessary mapping so that for a user like Hadoop will use mapping defined here and not your LDAP server. Please see https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/core-default.xml

Example :

“hadoop.user.group.static.mapping.overrides” : “hive=hadoop,hive;hdfs=hadoop,hdfs;oozie=users,hadoop,oozie;mapred=hadoop,mapred;yarn=hadoop;”

– We can use “hdfs group <ldap-user-name>” to test if the org.apache.hadoop.security.LdapGroupsMapping is working or not. This command will contact your LDAP server configured on “hadoop.security.group.mapping.ldap.url” to get the LDAP Group information. If this returns an error, then there might be some issue with your LDAP server config that you set using “hadoop.security.group.mapping.ldap.*”.

 

 

Trackback from your site.

Leave a comment