Expectations while using EMRFS :
– $ hadoop The fs -put command updates both the S3 and EMRFS tables (files are created in S3 and also in the EMRFS table).
Ex) hadoop fs -put ./111.txt s3: // mannem / emrfstest1
– $ hadoop The fs -rm command updates S3 but does not update the EMRFS table. (Deleted in S3, but still in the EMRFS table)
Ex) hadoop fs -rm -f s3: //mannem/emrfstest1/111.txt
– $ hadoop The fs -mv command will rename S3 (create a new name after deleting it), but only add a new name to the EMRFS table (add new name without deleting existing information).
Ex) hadoop fs -mv s3: //mannem/emrfstest1/emrfs-didnt-work.png s3: //mannem/emrfstest1/emrfs-didnt-work_new.png
– Adding files from the S3 console (WEB UI) is added to S3 but not to the EMRFS table
– Deleting files from the S3 console (WEB UI) will delete them in S3, but they will not be deleted in the EMRFS table
– Renaming a file in the S3 console (WEB UI) changes the name in S3, but does not rename the EMRFS table.
– The EMRFS CLI (for example, $ emrfs delete or $ emrfs sync) does not change the actual data in S3. Only add / delete DynamODB meta tables used by EMRFS.
– EMRFS uses DynamoDB. In EMR clusters, you can view the table names with the emrfs describe-metadata command. You can also see it on the EMR web console.
[Hadoop @ ip-172-31-16-84 ~] $ emrfs describe-metadata
EmrFsApplication.scala (91): dynamoDB endPoint = dynamodb.ap-southeast-2.amazonaws.com
EmrFsApplication.scala (99): s3 endPoint = s3-ap-southeast-2.amazonaws.com
EmrFsApplication.scala (107): sqs endPoint = sqs.ap-southeast-2.amazonaws.com
EmrFSMetadata << --- DynamoDB table name
Approximate-item-count (6 hour delay): 156,432
– In the EMRFS table, S3 prefix value is entered in HashKey and Object name is entered in Rangekey.
– You can optionally specify the number of retries and the time to wait until the next retry when an exception occurs
– Please refer to http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-files-tracked.html for related information.
Trackback from your site.