Tuesday, October 10, 2017

Spark/Hive 'Unable to alter table' Issue Complaining 404 Not Found On AWS S3

There's a ETL which will create bunch of tables per day, for each of them, take tbl_a as an example, the procedure will be as following:

  1. drop table if exists tbl_a_tmp
  2. create table tbl_a_tmp
  3. alter table tbl_a_tmp rename to tbl_a

But sometimes (randomly), it would fail on alter table complaining errors:

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Alter Table operation for db.tbl_a_tmp failed to move data due to: 'com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 8070D6C52909BDF2), S3 Extended Request ID: hpOCEw8ET6juVKjUTk3...nDCD9pEFN7scyQ8vPWFh3v5QM4=' See hive log file for details.

Then I tried to use another way of coding to alter the table name via create table tbl_a stored as parquet as select * from tbl_a_tmp, then a more concrete error is printed: "java.io.FileNotFoundException: File s3://bucket_name/db/tbl_a_tmp/_temporary/0/_temporary does not exist."

I checked and there's a _temporary 'folder' existing in AWS S3, which is empty. I deleted it and rerun alter table again and everything works fine now. I think there's possible a bug on Spark/Hive code which will leave _temporary file undeleted after the job is done.

No comments:

Post a Comment