Management Information Systems (MIS) > Quiz > Big Data Management & Analytics (All)
1 WordCount for Named Entities In this part, you will compute the word frequency for named entities in a large file. You are free to use any NLP library that works with Spark and Scala or PySpark. A... good choice is this one: https://github.com/JohnSnowLabs/spark-nlp-workshop The steps of the assignment would be as follows: 1. Find a large text file from the Gutenberg project: https://www.gutenberg.org and upload it to your Databricks cluster. 2. Write code for a mapreduce program in Scala/PySpark which reads in the file, and then extracts only the named entities. A good resource for this is the Spark NLP library of John Snow labs: https://nlp.johnsnowlabs.com You are free to use any other library also. 3. The output from the map task should be in the form of (key, Value) where the key is the named entity and value is its count (i.e. once every time it occurs) [Show More]
Last updated: 2 years ago
Preview 1 out of 3 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Aug 11, 2022
Number of pages
3
Written in
This document has been written for:
Uploaded
Aug 11, 2022
Downloads
0
Views
88
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·