8.1. HDFS

This tab covers HDFS settings. Here you can set properties for the NameNode, Secondary NameMode, DataNodes, and some general and advanced properties. Click the name of the group to expand and collapse the display.

 

Table 3.1. HDFS Settings:NameNode

Name Notes
NameNode host This value is prepopulated based on your choices on previous screens. The host that has been assigned to run NameNode.
NameNode directories NameNode directories for HDFS to store the file system image.
NameNode Java heap size Initial and maximum Java heap size for NameNode (Java options -Xms and -Xmx)
NameNode new generation size Default size of Java new generation for NameNode (Java option -XX:NewSize) Note: The value of NameNode new generation size (default size of Java new generation for NameNode (Java option -XX:NewSize)) should be 1/8 of maximum heap size (-Xmx). Ensure that the value of the namenode_opt_newsize property is 1/8 the value of maximum heap size (-Xmx).

 

Table 3.2. HDFS Settings:SNameNode

Name Notes
SNameNode host This value is prepopulated based on your choices on previous screens. The host that has been assigned to run the SecondaryNameNode.
Secondary NameNode Checkpoint Directory Directory on the local filesystem where the SecondaryNameNode should store the temporary images to merge

 

Table 3.3. HDFS Settings:DataNodes

Name Notes
DataNode hosts The hosts that have been assigned to run DataNode
DataNode directories DataNode directories for HDFS to store the data blocks
DataNode maximum Java heap size Maximum Java heap size for DataNode (Java option -Xmx)
DataNode volumes failure toleration The number of volumes that are allowed to fail before a DataNode stops offering services.

 

Table 3.4. HDFS Settings:General

Name Notes
WebHDFS enabled Check to enable WebHDFS
Hadoop maximum Java heap size Maximum Java heap size for daemons such as Balancer (Java option -Xmx) [a] 
Reserved space for HDFS Reserved space in GB per volume for HDFS
HDFS Maximum Checkpoint Delay Maximum delay between two consecutive checkpoints for HDFS
HDFS Maximum Edit Log Size for Checkpointing Maximum size of the edits log file that forces an urgent checkpoint even if the maximum checkpoint delay is not reached

[a] The default value for this property is 1 GB. This value may need to be reduced for a VM-based installation. On the other hand, for significant work using Hive Server, 2GB is a more realistic value.


 

Table 3.5. HDFS Settings:Advanced

Name Notes
Hadoop Log Dir Prefix The parent directory for Hadoop log files. The HDFS log directory will be ${hadoop_log_dir_prefix}/${hdfs_user} and the MapReduce log directory will be ${hadoop_log_dir_prefix}/${mapred_user}
Hadoop PID Dir Prefix The parent directory in which the PID files for Hadoop processes will be created. The HDFS PID directory will be ${hadoop_pid_dir_prefix}/${hdfs_user} and the MapReduce PID directory will be ${hadoop_pid_dir_prefix}/${mapred_user}
Exclude hosts Names a file that contains a list of hosts that are not permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory. 
Include hosts Names a file that contains a list of hosts that are permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory.
Block replication Default block replication
dfs.block.local-path-access.user The user who is allowed to perform short-circuit reads
dfs.datanode.socket.write.timeout DFS client write socket timeout
dfs.replication.max Maximal block replication
dfs.heartbeat.interval DataNode heartbeat interval in seconds
dfs.safemode.threshold.pct The percentage of blocks that should satisfy the minimal replication requirement set by dfs.replication.min. Values less than or equal to 0 mean not to start in safe mode. Values greater than 1 make safe mode permanent.
dfs.balance.bandwidthPerSec The maximum amount of bandwidth that each DataNode can utilize for balancing purposes in terms of the number of bytes per second
dfs.block.size Default block size for new files
dfs.datanode.ipc.address The DataNode IPC server address and port. If the port is 0 the server starts on a free port.
dfs.blockreport.initialDelay Delay for first block report in seconds
dfs.datanode.du.pct The percentage of real available space to use when calculating remaining space
dfs.namenode.handler.count The number of server threads for the NameNode
dfs.datanode.max.xcievers PRIVATE CONFIG VARIABLE
dfs.umaskmode The octal umask used in creating files and directories
dfs.web.ugi The user account used by the web interface. Syntax: USERNAME, GROUP1, GROUP2 ....
dfs.permissions If true, enable permissions checking in HDFS. If false, permission checking is turned off, but all other behavior stays unchanged. Switching from one value to the other does not change the mode, owner, or group of files or directories.
dfs.permissions.supergroup The name of the group of superusers
ipc.server.max.response.size  
dfs.block.access.token.enable If true access tokens are required to access DataNodes. If false access tokens are not checked.
dfs.secondary.https.port The https port where the SecondaryNameNode binds
dfs.https.port The https port where the NameNode binds
dfs.access.time.precision The access time for HDFS file is precise to this value. The default value is 1 hour. A value of 0 disables access times for HDFS.
dfs.cluster.administrators ACL for all who can view the default servlets in HDFS
ipc.server.read.threadpool.size  
io.file.buffer.size The size of bugger for use in sequence files. The size of this bugger should probably be a multiple of hardware page size (4096 on Intel x86). This value determines how much data is buffered during read and write operations.
io.serializations  
io.compression.codec.lzo.class The implementation class for the LZO codec.
fs.trash.interval Number of minutes between trash checkpoints. If zero, the trash feature is disabled.
ipc.client.idlethreshold The threshold number of connections after which connections are inspected for idleness
ipc.client.connection.maxidletime Maximum time after which the client brings down the connection to the server
ipc.client.connect.max.retries Maximum number of retries for IPC connections
webinterface.private.actions If true, the native web interfaces for JT and NN may contain actions, such as kill job, delete file, etc. that should not be exposed to the public. Enable this option if these interfaces are reachable only by appropriately authorized users.
Custom Hadoop Configs Use this text box to enter values for core-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs.
Custom HDFS Configs Use this text box to enter values for hdfs-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs.


loading table of contents...