Question 1

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?

Accepted Answer

A)   The values are in sorted order.
B)   The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.
C)   The values are arbitrary ordered, but multiple runs of the same MapReduce job will always have the same ordering.
D)   Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.E) C) and D)
F) All of the above

Question 2

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

Accepted Answer

A)   Increase the parameter that controls minimum split size in the job configuration.
B)   Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C)   Set the number of mappers equal to the number of input files you want to process.
D)   Write a custom FileInputFormat and override the method isSplitable to always return false.E) C) and D)
F) All of the above

Question 3

On a cluster running MapReduce v1 (MRv1) , a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?

Accepted Answer

A)   The amount of RAM installed on the TaskTracker node.
B)   The amount of free disk space on the TaskTracker node.
C)   The number and speed of CPU cores on the TaskTracker node.
D)   The average system load on the TaskTracker node over the past fifteen (15)  minutes.
E)   The location of the InsputSplit to be processed in relation to the location of the node.F) B) and D)
G) A) and E)

Question 4

You have the following key-value pairs as output from your Map task: (the, 1)  (fox, 1)  (faster, 1)  (than, 1)  (dog, 1)  How many keys will be passed to the Reducer's reduce method?

Accepted Answer

A)   Six
B)   Five
C)   Four
D)   Two
E)   OneF)  ThreeG) A) and D)
H) C) and D)

Question 5

You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses TextInputFormat: the mapper applies a regular expression over input values and emits key-values pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reduces to one and settings the number of reducers to zero.

Accepted Answer

A)   There is no difference in output between the two settings.
B)   With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances of matching patterns are stored in a single file on HDFS.
C)   With zero reducers, all instances of matching patterns are gathered together in one file on HDFS. With one reducer, instances of matching patterns are stored in multiple files on HDFS.
D)   With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one reducer, all instances of matching patterns are gathered together in one file on HDFS.E) A) and C)
F) B) and D)

Question 6

A combiner reduces:

Accepted Answer

A)   The number of values across different keys in the iterator supplied to a single reduce method call.
B)   The amount of intermediate data that must be transferred between the mapper and reducer.
C)   The number of input files a mapper must process.
D)   The number of output files a reducer must produce.E) A) and C)
F) A) and D)

Question 7

All keys used for intermediate output from mappers must:

Accepted Answer

A)   Implement a splittable compression algorithm.
B)   Be a subclass of FileInputFormat.
C)   Implement WritableComparable.
D)   Override isSplitable.
E)   Implement a comparator for speedy sorting.F) A) and E)
G) A) and B)

Question 8

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

Accepted Answer

A)   When the types of the reduce operation's input key and input value match the types of the reducer's output key and output value and when the reduce operation is both communicative and associative.
B)   When the signature of the reduce method matches the signature of the combine method.
C)   Always. Code can be reused in Java since it is a polymorphic object-oriented programming language.
D)   Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to increase performance.
E)   Never. Combiners and reducers must be implemented separately because they serve different purposes.F) C) and D)
G) A) and E)

Question 9

Determine which best describes when the reduce method is first called in a MapReduce job?

Accepted Answer

A)   Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The programmer can configure in the job what percentage of the intermediate data should arrive before the reduce method begins.
B)   Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called only after all intermediate data has been copied and sorted.
C)   Reduce methods and map methods all start at the beginning of a job, in order to provide optimal performance for map-only or reduce-only jobs.
D)   Reducers start copying intermediate key-value pairs from each Mapper as soon as it has completed. The reduce method is called as soon as the intermediate key-value pairs start to arrive.E) A) and D)
F) All of the above

Question 10

You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?

Accepted Answer

A)   Increase the block size on all current files in HDFS.
B)   Increase the block size on your remaining files.
C)   Decrease the block size on your remaining files.
D)   Increase the amount of memory for the NameNode.
E)   Increase the number of disks (or size)  for the NameNode.F)  Decrease the block size on all current files in HDFS.G) A) and B)
H) B) and D)

Question 11

You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR)  file. Which is the best way to make this library available to your MapReducer job at runtime?

Accepted Answer

A)   Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.
B)   Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
C)   When submitting the job on the command line, specify the -libjars option followed by the JAR file path.
D)   Package your code and the Apache Commands Math library into a zip file named JobJar.zipE) A) and B)
F) A) and C)

Question 12

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

Accepted Answer

A)   The file will be marked as corrupted if data node B fails during the creation of the file.
B)   Each data node locks the local file to prohibit concurrent readers and writers of the file.
C)   Each data node stores a copy of the file in the local file system with the same name as the HDFS file.
D)   The file can be accessed if at least one of the data nodes storing the file is available.E) B) and C)
F) B) and D)

Question 13

MapReduce v2 (MRv2/YARN)  splits which major functions of the JobTracker into separate daemons? Select two.

Accepted Answer

A)   Heath states checks (heartbeats) 
B)   Resource management
C)   Job scheduling/monitoring
D)   Job coordination between the ResourceManager and NodeManager
E)   Launching tasksF)  Managing file system metadataG)  MapReduce metric reportingH)  Managing tasksI) A) and B)
J) B) and G)

Question 14

Indentify which best defines a SequenceFile?

Accepted Answer

A)   A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
B)   A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C)   A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D)   A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.E) A) and C)
F) C) and D)

Question 15

For each intermediate key, each reducer task can emit:

Accepted Answer

A)   As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous) .
B)   As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs.
C)   As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type.
D)   One final key-value pair per value associated with the key; no restrictions on the type.
E)   One final key-value pair per key; no restrictions on the type.F) A) and E)
G) C) and D)

Question 16

You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

Accepted Answer

A)   HDFS command
B)   Pig LOAD command
C)   Sqoop import
D)   Hive LOAD DATA command
E)   Ingest with Flume agentsF)  Ingest with Hadoop StreamingG) A) and F)
H) B) and C)

Exam 2: Cloudera Certified Developer for Apache Hadoop (CCDH)

You have the following key-value pairs as output from your Map task: (the, 1) (fox, 1) (faster, 1) (than, 1) (dog, 1) How many keys will be passed to the Reducer's reduce method?

Correct Answer
verified

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

Correct Answer
verified

For each intermediate key, each reducer task can emit:

Correct Answer
verified

You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?

Correct Answer
verified

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

Correct Answer
verified

You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

Correct Answer
verified

All keys used for intermediate output from mappers must:

Correct Answer
verified

Correct Answer
verified

Indentify which best defines a SequenceFile?

Correct Answer
verified

MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.

Correct Answer
verified

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?

Correct Answer
verified

A combiner reduces:

Correct Answer
verified

You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?

Correct Answer
verified

Determine which best describes when the reduce method is first called in a MapReduce job?

Correct Answer
verified

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

Correct Answer
verified

On a cluster running MapReduce v1 (MRv1) , a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?

Correct Answer
verified

Exam 2: Cloudera Certified Developer for Apache Hadoop (CCDH)

You have the following key-value pairs as output from your Map task: (the, 1) (fox, 1) (faster, 1) (than, 1) (dog, 1) How many keys will be passed to the Reducer's reduce method?

Correct AnswerverifiedShow Answer

In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?

Correct AnswerverifiedShow Answer

For each intermediate key, each reducer task can emit:

Correct AnswerverifiedShow Answer

You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't. You know you have ample space on your DataNodes. Which action should you take to relieve this situation and store more files in HDFS?

Correct AnswerverifiedShow Answer

A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?

Correct AnswerverifiedShow Answer

You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

Correct AnswerverifiedShow Answer

All keys used for intermediate output from mappers must:

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Indentify which best defines a SequenceFile?

Correct AnswerverifiedShow Answer

MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.

Correct AnswerverifiedShow Answer

In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?

Correct AnswerverifiedShow Answer

A combiner reduces:

Correct AnswerverifiedShow Answer

You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte Java archive (JAR) file. Which is the best way to make this library available to your MapReducer job at runtime?

Correct AnswerverifiedShow Answer

Determine which best describes when the reduce method is first called in a MapReduce job?

Correct AnswerverifiedShow Answer

When can a reduce class also serve as a combiner without affecting the output of a MapReduce program?

Correct AnswerverifiedShow Answer

On a cluster running MapReduce v1 (MRv1) , a TaskTracker heartbeats into the JobTracker on your cluster, and alerts the JobTracker it has an open map task slot. What determines how the JobTracker assigns each map task to a TaskTracker?

Correct AnswerverifiedShow Answer

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified