Computer Networking > QUESTIONS & ANSWERS > UMBC – CSEE Department Data Science Program Spring 2022 DATA 603 – Big Data Platforms Homewo (All)
UMBC – CSEE Department Data Science Program Spring 2022 DATA 603 – Big Data Platforms Homework #6 – Spark Programming Questions: (1) [10 points] Simulate the Aggregate() example in the... slides, with initial value of (1,0), What would be the expected results? a. List is [1, 2, 3, 4] b. Partition is 2 c. Initial zeroValue is (1, 0) (2) [20 points] Write a Spark program that reads your browser history file, then displays the top 5 websites you visited in the last week? (3) [20 points] Implement a spark program that performs the following: a. Reads the posted text file for a book named “Applied Data Science.txt” b. Read the text file into an RDD, and then perform actions and transformations on the RDD c. Displays the most used 5 words of length greater than 5 characters in the file (ensure you result is not case sensitive, so the word “Data” and “data” are the same and should be counted two of the same word) d. The output should be like this: The most used words in the Applied Data Science textbook are: <<word1>> occurred <<n1>> times [Show More]
Last updated: 2 years ago
Preview 1 out of 4 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Apr 13, 2023
Number of pages
4
Written in
This document has been written for:
Uploaded
Apr 13, 2023
Downloads
0
Views
40
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·