O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

BIG DATA PROJECT – ROLLER COASTER DATA ANALYSIS.pptx

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Spark - Citi Bike NYC
Spark - Citi Bike NYC
Carregando em…3
×

Confira estes a seguir

1 de 25 Anúncio

Mais Conteúdo rRelacionado

Mais recentes (20)

Anúncio

BIG DATA PROJECT – ROLLER COASTER DATA ANALYSIS.pptx

  1. 1. BIG DATA PROJECT – ROLLER COASTER TYCOON DATA ANALYSIS Priyansh Gupta – 19CSU225 Radhika Mongia– 19CSU233 Rashi Gupta – 19CSU241
  2. 2. OBJECTIVE The objective of the project is to analyze the Roller Coaster Dataset and solve various queries using HIVE.
  3. 3. DATA FORMAT: ALL THE FIELDS ARE COMMA-DELIMITED 1. park_id int, 2. theme string, 3. rollercoaster_type string, 4. custom_design int, 5. excitement double, 6. excitement_rating string, 7. intensity double, 8. intensity_rating string, 9. nausea double, 10. nausea_rating string, 11. max_speed double, 12. avg_speed double, 13. ride_time int, 14. ride_length int, 15. max_pos_gs double, 16. max_neg_gs double, 17. max_lateral_gs double, 18. total_air_time double, 19. drops int, 20. highest_drop_height int, 21. inversions int
  4. 4. DATA SET
  5. 5. ● Load data into Hive table from our local file system Load Data Local Inpath '/home/cloudera/Desktop/rollercoasters.csv' Overwrite Into table rollercoaster;
  6. 6. Question 1 ● Number of rollercoaster type based on excitement and nausea and also print theme name select theme, excitement_rating, nausea_rating , count(rollercoaster_type) from rollercoaster group by excitement_rating, nausea_rating,theme;
  7. 7. Question 2 ● No. of rollercoaster where grouping based on excitement level and drop height a) where excitement level is highest(very high) and drop_height>50 b) where excitement level is high and drop_height>50 and also print the park_id. a) select excitement_rating, highest_drop_height , count(rollercoaster_type) from rollercoaster where excitement_rating = 'Very High' and highest_drop_height > 50 group by excitement_rating, highest_drop_height;Q
  8. 8. b) select park_id,excitement_rating,highest_drop_height , count(rollercoaster_type) from rollercoaster group by park_id, excitement_rating, highest_drop_height having excitement_rating = 'High' and highest_drop_height > 50;
  9. 9. Question 3 a) Find out the name of rollercoaster_type, excitement_level intensity _level and nausea_level where total_air_time is max and b) Find out the total_air_time of that rows whose excitement_level intensity _level and nausea_level is similar to row where total_air_time is maximum. a) select distinct rollercoaster_type,excitement_rating,intensity_rating , nausea_rating from rollercoaster as r1 where r1.total_air_time in (select max(r2.total_air_time) from rollercoaster as r2);
  10. 10. b) select r1.total_air_time from rollercoaster as r1 inner join (select distinct excitement_rating,intensity_rating , nausea_rating from rollercoaster as rc where rc.total_air_time in (select max(total_air_time) from rollercoaster)) as t on t.excitement_rating = r1.excitement_rating and t.intensity_rating = r1.intensity_rating and t.nausea_rating = r1.nausea_rating;
  11. 11. Question 4 a) Find out the name of rollercoaster_type, excitement_level ,intensity _level and nausea_level where avg_speed is max and b) Compare the max_speed of those rows whose excitement_level intensity _level and nausea_level is similar to row where avg_speed is maximum. a) select rollercoaster_type,excitement_rating,intensity_rating, nausea_rating from rollercoaster r1 where r1.avg_speed in (select max(r2.avg_speed) from rollercoaster r2);
  12. 12. b) select r3.max_speed from rollercoaster as r3 inner join (select excitement_rating,intensity_rating, nausea_rating from rollercoaster r1 where r1.avg_speed in(select max(r2.avg_speed) from rollercoaster r2) ) as t on r3.intensity_rating = t.intensity_rating and r3.excitement_rating = t.excitement_rating and r3.nausea_rating = t.nausea_rating;
  13. 13. Question 5 ● Find out the parkid and rollercoaster type where no of drop is greater than 10 and have same excitement _level. select x.park_id,x.rollercoaster_type from (select park_id,rollercoaster_type ,excitement_rating from rollercoaster where drops>10 group by park_id,rollercoaster_type ,excitement_rating) as x;
  14. 14. Question 6 ● Group rollercoaster_type based on custom_design where excitement level is high. select custom_design ,rollercoaster_type from rollercoaster where excitement_rating = 'High' group by custom_design,rollercoaster_type;
  15. 15. Question 7 ● If ride_length is greater than 2000 and max_speed is greater than 50 so what is the level of excitement and nausea. Select distinct excitement_rating,nausea_rating from rollercoaster where ride_length > 2000 and max_speed >50;
  16. 16. Question 8 ● Park_name(theme) where atleast 2 rides excitement level is high. Select x.theme from (select theme,count(excitement_rating) from rollercoaster where excitement_rating = 'High' group by theme having count(excitement_rating)>=2 ) as x;
  17. 17. Question 9 ● In which roller coaster ride excitement level and avg_speed is highest. Select rollercoaster_type from rollercoaster r1 where excitement_rating = 'Very High' and r1.avg_speed in (select max(r2.avg_speed ) from rollercoaster as r2 );
  18. 18. Question 10 ● Name of Rollercoaster where total_air_time is greater than 5 but still excitement_level is not very high. Select rollercoaster_type from rollercoaster where total_air_time>5 and excitement_rating <> 'Very High';
  19. 19. Question 11 ● If ride_length is greater than 2000 then find out avg_speed and excitement_level , group excitement_level based on avg_speed >10. select excitement_rating,avg_speed from rollercoaster where ride_length >2000 and avg_speed>10 group by excitement_rating,avg_speed;
  20. 20. Question 12 ● When max_pos> 3 and max_neg is >-2 then find out the name of rollercoaster where intensity_level is greater than excitement_level. Select rollercoaster_type from rollercoaster where max_pos_gs>3 and max_neg_gs>-2 and intensity > excitement;
  21. 21. Question 13 ● When max_pos>= 3 and max_neg is >=-2 count the no of rollercoaster grouping based on a) Intensity_level greater than equal or less than excitement_level and b) Find out the same when max_pos>= 4 and max_neg is >=1 condition is not true. a) Select count(distinct(rollercoaster_type)) ,intensity_rating from rollercoaster where max_pos_gs>=3 and max_neg_gs>=-2 and intensity > excitement group by intensity_rating;
  22. 22. b) Select count(distinct(rollercoaster_type)) ,intensity_rating from rollercoaster where max_pos_gs<4 and max_neg_gs<1 and intensity > excitement group by intensity_rating;
  23. 23. Question 14 ● When nausea_level is low that what is the value of excitement_level. select distinct excitement_rating from rollercoaster where nausea_rating = 'Low';
  24. 24. Question 15 ● Group rollercoaster_type based on custom_design where intensity level is very high and ride_length is greater than 2000. Select custom_design, rollercoaster_type from rollercoaster where intensity_rating= 'Very High' and ride_length>2000 group by custom_design , rollercoaster_type ;
  25. 25. THANK YOU

×