Mais conteúdo relacionado Semelhante a Introduction to MapReduce Data Transformations (20) Introduction to MapReduce Data Transformations2. A Brief History of MapReduce Confidential and proprietary. Copyright © 2008 Aster Data Systems 5. The quick brown fox jumps over the lazy dog. To be or not to be: that is the question. Switch The world only needs five computers. Hello world. In-Database MapReduce is the future. MapReduce is a very powerful programming paradigm. Confidential and proprietary. Copyright © 2008 Aster Data Systems Server A Server B Server C Server D 6. Goal We Want to Count the # of Times Each Word Occurs Confidential and proprietary. Copyright © 2008 Aster Data Systems 7. 1 st Approach No MapReduce 1 st Approach No MapReduce Confidential and proprietary. Copyright © 2008 Aster Data Systems 8. The quick brown fox jumps over the lazy dog To be or not to be: that is the question. Switch The world only needs five computers. Hello world. In-Database MapReduce is the future. MapReduce is a very powerful concept. the quick brown fox jumps over the lazy dog in database mapreduce is the future the world only needs five computers the quick brown fox jumps over the lazy dog in database mapreduce is the future the world only needs five computers hello world mapreduce is a very powerful concept to be or not to be that is the question Confidential and proprietary. Copyright © 2008 Aster Data Systems Server A Server B Server C Server D hello world mapreduce is a very powerful concept to be or not to be that is the question 11. 2 nd Approach No MapReduce Fully Distributed Confidential and proprietary. Copyright © 2008 Aster Data Systems 12. The quick brown fox jumps over the lazy dog To be or not to be: that is the question. Switch The world only needs five computers. Hello world. In-Database MapReduce is the future. MapReduce is a very powerful concept. Confidential and proprietary. Copyright © 2008 Aster Data Systems Server A Server B Server C Server D the quick brown fox jumps over the lazy dog in database mapreduce is the future the world only needs five computers hello world mapreduce is a very powerful concept to be or not to be that is the question the the the the the database database future world world powerful lazy brown mapreduce mapreduce be be to jumps computers hello is is is question over a that 13. Confidential and proprietary. Copyright © 2008 Aster Data Systems Server 1 Final Result File the 5 … … . Server 2 Final Result File world 2 … … . Server 3 Final Result File mapreduce 2 … … . Server 4 Final Result File is 3 … … . 14. 2 nd Approach: No MapReduce, Distributed Confidential and proprietary. Copyright © 2008 Aster Data Systems 15. Does it work? Yes Is it a pain? Yes!! Does it take lots of time? Yes! Would you do it? No!!! Confidential and proprietary. Copyright © 2008 Aster Data Systems 17. Data Redistribution and Grouping Confidential and proprietary. Copyright © 2008 Aster Data Systems Map() Input Any file (e.g. documents) Output Stream of <key, value> pairs (e.g. <word, count> pairs) Input All <key, value> pairs with the same key grouped (e.g. all <word, count> pairs where word = “the”) Output Anything (e.g. sum of counts for a specific word) Reduce() 18. The quick brown fox jumps over the lazy dog In-Database MapReduce is the future. <the, 1> <quick, 1> <brown,1> <fox,1> <jumps,1> <over,1> <the,1> <lazy,1> <dog,1> <in, 1> <database, 1> <mapreduce,1> <is,1> <the,1> <future,1> <world,1> <world,1> <powerful,1> <lazy,1> <brown,1> <mapreduce,1> <mapreduce,1> <be,1> <be,1> <to,1> <jumps,1> <computers,1> <hello,1> <is,1> <is,1> <is,1> <question,1> <over,1> <a,1> <that,1> Switch <the, 1> <the, 1> <the, 1> <the, 1> <the, 1> <database,1> <database,1> <future,1> Map() and Redistribution Phase Confidential and proprietary. Copyright © 2008 Aster Data Systems Map() Map() Server A Server B Server C Server D 19. <the, 1> <the, 1> <the, 1> <the, 1> <the, 1> <database,1> <database,1> <future,1> <the, 1> <the, 1> <the, 1> <the, 1> <the, 1> <database,1> <database,1> <future,1> Grouping and Reduce() Phase (on Server 1) Confidential and proprietary. Copyright © 2008 Aster Data Systems Reduce() Server 1 Final Result File the 5 database 2 future 1 Reduce() Reduce() 25. Beyond SQL and MapReduce Confidential and proprietary. Copyright © 2008 Aster Data Systems 32. Example 2: Sessionization Slide Session Timeout = 60 seconds Clickstream Confidential and proprietary. Copyright © 2008 Aster Data Systems timestamp userid 10:00:00 Shawn1 00:58:24 PrezBush 10:00:24 Shawn1 02:30:33 PrezBush 10:01:23 Shawn1 10:02:40 Shawn1 timestamp userid sessionid 10:00:00 Shawn1 0 10:00:24 Shawn1 0 10:01:23 Shawn1 0 10:02:40 Shawn1 1 timestamp userid sessionid 00:58:24 PrezBush 0 02:30:33 PrezBush 1 INPUT OUTPUT