Computer Science > QUESTIONS & ANSWERS > Revature Week 4 Review Questions (All)

Revature Week 4 Review Questions

Document Content and Description Below

Revature Week 4 Review Questions What is Hive? - ✔✔Hive is a tool that allows for SQL-Like querying on big data. Originally built as a way to run MapReduce jobs by writing SQL, but has since changed (We're still using Hive on MapReduce jobs though) Where is the default location of Hive's data in HDFS? - ✔✔o In the $HIVE_HOME directory. o By default, all database and table data files are stored at /user/hive/warehouse What is an External table? - ✔✔o Data kept outside of Hive that we query using Hive What is a Managed table? - ✔✔o Data kept inside of Hive's internal data warehouse. This gives safety + efficiency on the data since Hive controls it. What is a Hive partition? - ✔✔o A Hive partition is a column of a table that we have split off into a smaller dataset. Provide an example of a good column or set of columns to partition on? - ✔✔o Time. We can select an appropriate resolution to get reasonably sized partitions, it is easy to add new data, and many queries subset time. What's the benefit of partitioning? - ✔✔o Selecting the columns we have partitioned can lead to increased performance. What does a partitioned table look like in HDFS? - ✔✔o There will be one directory in the table in HDFS per partition What is a Hive bucket? - ✔✔o Bucketing is another tool to subset our data. It basically splits the data equally into subsets, where each subset is reflective of the whole dataset. What does it mean to have data skew and why does this matter when bucketing? - ✔✔o Data skew is when our subsets have some non-uniform distribution. For example, if we bucket a table based on continent, and we end up with one subset with only people from North America, it would be skewed. What does a bucketed table look like in HDFS? - ✔✔o It would look similar to partitioning, except instead of multiple directories, we would get different files for each bucket. What is the Hive metastore? - ✔✔o The metastore contains all the data for managed and external tables. This includes columns, table names, database names, etc. What is beeline? - ✔✔o Beeline is a JDBC (Java Database Connectivity) client that can be used from the command line to interact with Hiveserver2 and run SQL-like queries. How do you create a table? - ✔✔o CREATE TABLE student( first_name STRING, last_name STRING, age INT, state STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES("skip.header.line.count"="1"); How do you load data into a table? - ✔✔o LOAD DATA LOCAL INPATH '/home/username/datafile' INTO TABLE Note: data may or may not be local How do you query data in a table? - ✔✔o SELECT

[Show More]

Last updated: 3 years ago

Preview 1 out of 7 pages

Buy Now

Instant download

We Accept: