Got raw data from car, cleaned and preprocessed data on EC2 machines, based on amount of data spun up EMR instances, copied data to it, ran MR jobs, ran Hive scripts (which were dynamically created), then used sqoop to copy over final processed output to Postgresql db, then shut down the emr instances and did cleanup operations.