Computer Architecture > QUESTIONS & ANSWERS > HPC/Big Data Certification Exam 2022 with complete solution (All)
HPC/Big Data Certification Exam 2022 with complete solution 1. What qualifies as a Big Data Workload? >>>>>· Consists of semi-structured, or unstructured data not suitable for relational databases... · Data volume is considered too large for other solutions (Petabyte scale) · To process the data in a reasonable timeframe, a massively parallel solution is required Most common Big Data workloads are? >>>>>· Batch processing · In memory processing · ML (typically GPU based) What are the challenges customers run into for Big Data workloads on premises? >>>>>· Tracking growth patterns and scaling infrastructure to meet capacity requirements · Time associated with procuring, deploying and maintaining infrastructure to meet demand · The cost associated with processing this data using other methods · The cost associated with Disaster Recovery when dealing with Petabytes of data · The cost and complexity associated with hardware refresh Customers running Big Data workloads on OCI can ___? >>>>>· Dynamically scale capacity against demand · Leverage Object Storage as a cost-effective Data Lake and for Disaster Recovery · Take advantage of the best price/performance in the cloud · Use OCI's managed service offerings to easily deploy and run common Big Data frameworks What is Big Data Appliance (BDA)? >>>>>Single tenant, Cloudera based hardware appliance deployed on-prem What does Big Data Appliance (BDA) include? >>>>>· Cloudera Enterprise Data Hub (EDH) v5.12 · Big Data Manager · Big Data SQL What is Oracle Big Data Service (BDS)? >>>>>· Multitenant, managed Cloudera EDH Hadoop Deployment What does Oracle Big Data Service (BDS) include? >>>>>· Cloudera EDH v5.16.1 or v6.2.0 · Big Data Manager · Big Data SQL What is the difference between Big Data Cloud Service (BDCS) and Big Data Service (BDS)? >>>>>· BDCS - Gen1, BDS - Gen2 · BDCS - Cloudera EDH v5.16.x, BDS - Cloudera EDH v5.16.1 or v6.2.0 · BDCS - deprecated, BDS - license included in consumption What is Oracle Data Flow (ODF)? >>>>>· Provides serverless framework for running Spark based workloads Where do customers put data and application code for Oracle Data Flow (ODF) applications? >>>>>· Object Storage Oracle Data Flow (ODF) provides support for what type of applications? >>>>>· Java · Python · SQL · Scala What is Oracle Data Science? >>>>>· Platform for data scientists to create projects which run notebook-based modeling on-demand What services does Oracle Data Science use? >>>>>· Compute · Block Storage What shapes are available for Oracle Data Science Notebook Sessions >>>>>· VM.Standard.E2.2, VM.Standard.E2.4, VM.Standard.E2.8 · VM.Standard2.1, VM.Standard2.2, VM.Standard2.4, VM.Standard2.6, VM.Standard2.8, VM.Standard2.16, VM.Standard2.24, What is Oracle Streaming Service? >>>>>· Kafka compatible producer/consumer service that ingests continuous streams of data Which Hadoop distributions are supported on OCI? >>>>>· Cloudera · Hortonworks · MapR Some self-managed Big Data Products are driven by ___? >>>>>· OCI QuickStart program · Marketplace 1. When deploying Hadoop on OCI, what should you consider? >>>>>· Normalize either OCPU or Memory against OCI shapes used as workers to meet workload requirements · Use HDFS replication factor 3 when using DenseIO NVMe storage to mitigate hardware failure · After normalizing OCPU or Memory, use heterogeneous storage on DenseIO workers leveraging Block Storage to augment HDFS capacity - allows you to scale HDFS capacity around workload · Segregate cluster and storage network traffic on BM hosts when using Block Volumes for HDFS by leveraging both physical VNICs - create a storage network for primary interface and deploy Hadoop on secondary interface · Use Private IP networks for Hadoop cluster hosts - enable cluster access either by using edge node(s) or VPN based access like FastConnect What is Terasort? >>>>>· Popular benchmark that measures the amount of time to sort 1TB of randomly distributed data on a given computer system · Used to measure MapReduce performance of an Apache Hadoop Cluster (all hardware layers - CPU, Memory, Storage, Network I/O) What are the phases of Terasort? >>>>>· TeraGen · TeraSort · TeraValidate What is TeraGen? >>>>>· Generate random dataset of specified size What is TeraSort? >>>>>· Map, Shuffle, Reduce the source data into smaller result set What is TeraValidate? >>>>>· Read the result set and validate it What is TeraGen heavily dependent on? >>>>>· Write intensive What is TeraSort heavily dependent on? >>>>>· Read, process, write, and I/O intensive What is TeraValidate heavily dependent on? >>>>>· Read intensive What is Terasort benchmark? >>>>>· Total time to run all 3 Terasort phases Draw a diagram of how TeraSort looks like >>>>>Look at study guide What is the most important phase of Terasort? >>>>>· TeraSort What are some steps for sizing when considering deployment on OCI? >>>>>· Always build in redundancy for DenseIO hosts to mitigate data loss - in case of Hadoop use HDFS replication factor of 3 for local NVMe storage · For low risk environments, can use a replication factor of 2 for Block Storage - more cost effective · Block Storage throughput uses the same bandwidth available to each instance VNIC What are some best practices for Big Data Migration? >>>>>· Object Storage · Data Transfer Appliance · FastConnect What is Data Transfer Appliance? >>>>>· A way for customers who have governance requirements which restrict copying sensitive data over wire or too much data (not enough time/bandwidth) in which Oracle delivers an appliance for the customer to load data, then Oracle uploads it to Object Storage [Show More]
Last updated: 2 years ago
Preview 1 out of 19 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Aug 31, 2022
Number of pages
19
Written in
This document has been written for:
Uploaded
Aug 31, 2022
Downloads
0
Views
113
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·