Computer Science > QUESTIONS & ANSWERS > Revature Week 4 Review Questions (All)
Revature Week 4 Review Questions What is Hive? - ✔✔Hive is a tool that allows for SQL-Like querying on big data. Originally built as a way to run MapReduce jobs by writing SQL, but has since cha ... nged (We're still using Hive on MapReduce jobs though) Where is the default location of Hive's data in HDFS? - ✔✔o In the $HIVE_HOME directory. o By default, all database and table data files are stored at /user/hive/warehouse What is an External table? - ✔✔o Data kept outside of Hive that we query using Hive What is a Managed table? - ✔✔o Data kept inside of Hive's internal data warehouse. This gives safety + efficiency on the data since Hive controls it. What is a Hive partition? - ✔✔o A Hive partition is a column of a table that we have split off into a smaller dataset. Provide an example of a good column or set of columns to partition on? - ✔✔o Time. We can select an appropriate resolution to get reasonably sized partitions, it is easy to add new data, and many queries subset time. What's the benefit of partitioning? - ✔✔o Selecting the columns we have partitioned can lead to increased performance. What does a partitioned table look like in HDFS? - ✔✔o There will be one directory in the table in HDFS per partition What is a Hive bucket? - ✔✔o Bucketing is another tool to subset our data. It basically splits the data equally into subsets, where each subset is reflective of the whole dataset. What does it mean to have data skew and why does this matter when bucketing? - ✔✔o Data skew is when our subsets have some non-uniform distribution. For example, if we bucket a table based on continent, and we end up with one subset with only people from North America, it would be skewed. What does a bucketed table look like in HDFS? - ✔✔o It would look similar to partitioning, except instead of multiple directories, we would get different files for each bucket. What is the Hive metastore? - ✔✔o The metastore contains all the data for managed and external tables. This includes columns, table names, database names, etc. What is beeline? - ✔✔o Beeline is a JDBC (Java Database Connectivity) client that can be used from the command line to interact with Hiveserver2 and run SQL-like queries. How do you create a table? - ✔✔o CREATE TABLE student( first_name STRING, last_name STRING, age INT, state STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES("skip.header.line.count"="1"); How do you load data into a table? - ✔✔o LOAD DATA LOCAL INPATH '/home/username/datafile' INTO TABLE <tablename> Note: data may or may not be local How do you query data in a table? - ✔✔o SELECT [Show More]
Last updated: 3 years ago
Preview 1 out of 7 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Oct 30, 2022
Number of pages
7
Written in
All
This document has been written for:
Uploaded
Oct 30, 2022
Downloads
0
Views
81
Scholarfriends.com Online Platform by Browsegrades Inc. 651N South Broad St, Middletown DE. United States.
We're available through e-mail, Twitter, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·