public class CleanUpUtil
extends java.lang.Object
This class is used to do clean-up work after BDD is restarted. It currently handles the following cases (mostly for
issues occur after graceful shutdown):
(1) For Upload: if a graceful shutdown happens after the Hive table is created but before Studio sent the Ingest
request, an orphan Hive table and avro/schema files would be left behind. These files would be removed here.
(2) For BulkExport to Computer: if a graceful shutdown happens after the temporary file is created in HDFS but before
it is downloaded and removed. The temporary file would be left behind. That file would also be removed here.
(3) For Transform Commit:
(3.1) in case of a graceful shutdown, the whole Commit could be with progressStatus of "WAITING_ON_COPY" or
progressStatus >= APPLY_TRANSFORM_RUNNING. We do nothing here for progressStatus=="WAITING_ON_COPY" (ESTUDIO-5246 has
covered that/the stuck commit). If progressStatus >= APPLY_TRANSFORM_RUNNING, the commit would be completed by DP and ES
eventually. But the UserPersistence record for that Commit operation may remain in the table.
(3.2) in case of a forced shutdown, Commit won't be completed. Also the UserPersistence record for that Commit operation
may remain in the table with arbitrary progressStatus. If this UserPersistence record is not removed, the app would
continue showing Commit in progress after the restart.
To handle these issues, right after restart we:
(a) remove all the Commit related UserPersistence records if their Statuses < APPLY_TRANSFORM_RUNNING.
(b) mark those records with Statues >= APPLY_TRANSFORM_RUNNING as LEFT_OVER_COMMIT and in DatasetManagerPortlet check
if the target dataset is ready. If it is ready, that means a graceful shutdown has just happened. The commit process
would be resumed (to call moveCollectionName etc.). Otherwise, it means a forced shutdown has just happened. In this
case, the UserPersistence record would be removed (i.e. the commit process is discarded). In a forced shutdown case,
discarded the commit process may produce several orphan files. But that is considered an acceptable behavior.