Expectations while using EMRFS

Written by mannem on . Posted in AWS BIG DATA, EMR || Elastic Map Reduce

Expectations while using EMRFS :

– $ hadoop The fs -put command updates both the S3 and EMRFS tables (files are created in S3 and also in the EMRFS table).
Ex) hadoop fs -put ./111.txt s3: // mannem / emrfstest1
– $ hadoop The fs -rm command updates S3 but does not update the EMRFS table. (Deleted in S3, but still in the EMRFS table)
Ex) hadoop fs -rm -f s3: //mannem/emrfstest1/111.txt
– $ hadoop The fs -mv command will rename S3 (create a new name after deleting it), but only add a new name to the EMRFS table (add new name without deleting existing information).
Ex) hadoop fs -mv s3: //mannem/emrfstest1/emrfs-didnt-work.png s3: //mannem/emrfstest1/emrfs-didnt-work_new.png
– Adding files from the S3 console (WEB UI) is added to S3 but not to the EMRFS table
– Deleting files from the S3 console (WEB UI) will delete them in S3, but they will not be deleted in the EMRFS table
– Renaming a file in the S3 console (WEB UI) changes the name in S3, but does not rename the EMRFS table.
– The EMRFS CLI (for example, $ emrfs delete or $ emrfs sync) does not change the actual data in S3. Only add / delete DynamODB meta tables used by EMRFS.
– EMRFS uses DynamoDB. In EMR clusters, you can view the table names with the emrfs describe-metadata command. You can also see it on the EMR web console.

 

– In the EMRFS table, S3 prefix value is entered in HashKey and Object name is entered in Rangekey.
– You can optionally specify the number of retries and the time to wait until the next retry when an exception occurs
: Http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-retry-logic.html
– Please refer to http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-files-tracked.html for related information.

Trackback from your site.

Leave a comment