Filters
Question type

Study Flashcards

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?


A) When the types of the reduce operation's input key and input value match the types of the reducer's output key and output value and when the reduce operation is both communicative and associative.
B) When the signature of the reduce method matches the signature of the combine method.
C) Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.
D) Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.
E) Never. Combiners and reducers must be implemented separately because they serve different purposes.

Correct Answer

verifed

verified

You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?


A) Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.
B) Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
C) When submitting the job on the command line, specify the -libjars option followed by the JAR file path.
D) Package your code and the Apache Commands Math library into a zip file named JobJar.zip

Correct Answer

verifed

verified

You have the following key-value pairs as output from your Map task: (the, 1) (fox, 1) (faster, 1) (than, 1) (dog, 1) How many keys will be passed to the Reducer's reduce method?


A) Six
B) Five
C) Four
D) Two
E) One
F) Three

Correct Answer

verifed

verified

MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.


A) Heath states checks (heartbeats)
B) Resource management
C) Job scheduling/monitoring
D) Job coordination between the ResourceManager and NodeManager
E) Launching tasks
F) Managing file system metadata
G) MapReduce metric reporting
H) Managing tasks

Correct Answer

verifed

verified

Determine which best describes when the reduce method is first called in a MapReduce job?


A) Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the intermediate data should arrive before the reduce method begins.
B) Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.
C) Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.
D) Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.

Correct Answer

verifed

verified

You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?


A) Increase the block size on all current files in HDFS.
B) Increase the block size on your remaining files.
C) Decrease the block size on your remaining files.
D) Increase the amount of memory for the NameNode.
E) Increase the number of disks (or size) for the NameNode.
F) Decrease the block size on all current files in HDFS.

Correct Answer

verifed

verified

You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses TextInputFormat: the mapper applies a regular expression over input values and emits key-values pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reduces to one and settings the number of reducers to zero.


A) There is no difference in output between the two settings.
B) With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances of matching patterns are stored in a single file on HDFS.
C) With zero reducers, all instances of matching patterns are gathered together in one file on HDFS. With one reducer, instances of matching patterns are stored in multiple files on HDFS.
D) With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one reducer, all instances of matching patterns are gathered together in one file on HDFS.

Correct Answer

verifed

verified

Indentify which best defines a SequenceFile?


A) A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
B) A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C) A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D) A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.

Correct Answer

verifed

verified

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?


A) The values are in sorted order.
B) The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.
C) The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.
D) Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.

Correct Answer

verifed

verified

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?


A) Increase the parameter that controls minimum split size in the job configuration.
B) Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C) Set the number of mappers equal to the number of input files you want to process.
D) Write a custom FileInputFormat and override the method isSplitable to always return false.

Correct Answer

verifed

verified

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?


A) The file will be marked as corrupted if data node B fails during the creation of the file.
B) Each data node locks the local file to prohibit concurrent readers and writers of the file.
C) Each data node stores a copy of the file in the local file system with the same name as the HDFS file.
D) The file can be accessed if at least one of the data nodes storing the file is available.

Correct Answer

verifed

verified

All keys used for intermediate output from mappers must:


A) Implement a splittable compression algorithm.
B) Be a subclass of FileInputFormat.
C) Implement WritableComparable.
D) Override isSplitable.
E) Implement a comparator for speedy sorting.

Correct Answer

verifed

verified

For each intermediate key, each reducer task can emit:


A) As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous) .
B) As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs.
C) As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type.
D) One final key-value pair per value associated with the key; no restrictions on the type.
E) One final key-value pair per key; no restrictions on the type.

Correct Answer

verifed

verified

A combiner reduces:


A) The number of values across different keys in the iterator supplied to a single reduce method call.
B) The amount of intermediate data that must be transferred between the mapper and reducer.
C) The number of input files a mapper must process.
D) The number of output files a reducer must produce.

Correct Answer

verifed

verified

You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?


A) HDFS command
B) Pig LOAD command
C) Sqoop import
D) Hive LOAD DATA command
E) Ingest with Flume agents
F) Ingest with Hadoop Streaming

Correct Answer

verifed

verified

On a cluster running MapReduce v1 (MRv1) , a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?


A) The amount of RAM installed on the TaskTracker node.
B) The amount of free disk space on the TaskTracker node.
C) The number and speed of CPU cores on the TaskTracker node.
D) The average system load on the TaskTracker node over the past fifteen (15) minutes.
E) The location of the InsputSplit to be processed in relation to the location of the node.

Correct Answer

verifed

verified

Showing 21 - 36 of 36

Related Exams

Show Answer