public class CleanUpUtil
extends java.lang.Object
This class is used to do clean-up work after BDD is restarted. It currently handles the following cases (mostly for issues occur after
graceful shutdown): (1) For Upload: if a graceful shutdown happens after the Hive table is created but before Studio sent the Ingest
request, an orphan Hive table and avro/schema files would be left behind. These files would be removed here. (2) For BulkExport to Computer:
if a graceful shutdown happens after the temporary file is created in HDFS but before it is downloaded and removed. The temporary file would
be left behind. That file would also be removed here. (3) For Transform Commit: (3.1) in case of a graceful shutdown, the whole Commit could
be with progressStatus of "WAITING_ON_COPY" or progressStatus >= APPLY_TRANSFORM_RUNNING. We do nothing here for
progressStatus=="WAITING_ON_COPY" (ESTUDIO-5246 has covered that/the stuck commit). If progressStatus >= APPLY_TRANSFORM_RUNNING, the commit
would be completed by DP and ES eventually. But the UserPersistence record for that Commit operation may remain in the table. (3.2) in case
of a forced shutdown, Commit won't be completed. Also the UserPersistence record for that Commit operation may remain in the table with
arbitrary progressStatus. If this UserPersistence record is not removed, the app would continue showing Commit in progress after the
restart.
To handle these issues, during startup we: (a) remove all the Commit related UserPersistence records if their Statuses <
APPLY_TRANSFORM_RUNNING. (b) mark those records with Statues >= APPLY_TRANSFORM_RUNNING as LEFT_OVER_COMMIT and in DatasetManagerPortlet
check if the target dataset is ready. If it is ready, that means a graceful shutdown has just happened. The commit process would be resumed
(to call swtichCollection etc.). Otherwise, it means a forced shutdown has just happened. In this case, the UserPersistence record would
be removed (i.e. the commit process is discarded). In a forced shutdown case, discarded the commit process may produce several orphan files.
But that is considered an acceptable behavior.
(4) For Curate (Loading-Full-Dataset) Workflow: (4.1) for a running curate workflow, it could be finished eventually in a graceful shutdown
case. The app would mark the workflow as WORKFLOW_INTERRUPTED and handle it later in DataSetManagerPortlet to check if the target dataset is
ready. If it is ready, that means a graceful shutdown has just happened. The commit process would be resumed (to call switchCollection
etc.). (4.2) pass the workflow with the status of INGEST_SUCCESS or failure status to DataSetManager portlet for further handling. (4.3)
remove the workflow with the other status.