8.1. HDFS

This tab covers HDFS settings. Here you can set properties for the NameNode, Secondary NameMode, DataNodes, and some general and advanced properties. Click the name of the group to expand and collapse the display.

 

Table 3.1. HDFS Settings:NameNode

NameNotes
NameNode hostThis value is prepopulated based on your choices on previous screens
NameNode directories NameNode directories for HDFS to store the file system image.
NameNode Java heap size Initial and maximum Java heap size for NameNode (Java options -Xms and -Xmx)
NameNode new generation sizeDefault size of Java new generation for NameNode (Java option -XX:NewSize)

 

Table 3.2. HDFS Settings:SNameNode

NameNotes
SNameNode hostThis value is prepopulated based on your choices on previous screens
Secondary NameNode Checkpoint Directory Directory on the local filesystem where the Secondary NameNode should store the temporary images to merge

 

Table 3.3. HDFS Settings:DataNodes

NameNotes
DataNode hosts The hostnames of the hosts on which this group's DataNodes run.
DataNode directoriesThe directories where HDFS should store the data blocks for this group. 
DataNode maximum Java heap sizeMaximum Java heap size for DataNode (Java option -Xmx)
DataNode volumes failure toleration The number of volumes that are allowed to fail before a DataNode stops offering services.

 

Table 3.4. HDFS Settings:General

NameNotes
WebHDFS enabled Check to enable WebHDFS
Hadoop maximum Java heap size Maximum Java heap size for daemons such as Balancer (Java option -Xmx) [a] 
Reserved space for HDFSSpace in GB per volume reserved for HDFS
HDFS Maximum Checkpoint DelayMaximum delay between two consecutive checkpoints for HDFS in seconds
HDFS Maximum Edit Log Size for Checkpointing Maximum size of the edits log file that forces an urgent checkpoint even if the maximum checkpoint delay is not reached

[a] The default value for this property is 1 GB. This value may need to be reduced for a VM-based installation. On the other hand, for significant work using Hive Server, 2GB is a more realistic value.


 

Table 3.5. HDFS Settings:Advanced

NameNotes
Hadoop Log Dir PrefixThe parent directory for Hadoop log files. The HDFS log directory will be ${hadoop_log_dir_prefix}/${hdfs_user} and the MapReduce log directory will be ${hadoop_log_dir_prefix}/${mapred_user}
Hadoop PID Dir Prefix The parent directory in which the PID files for Hadoop processes will be created. The HDFS PID directory will be ${hadoop_pid_dir_prefix}/${hdfs_user} and the MapReduce PID directory will be ${hadoop_pid_dir_prefix}/${mapred_user}
Exclude hostsNames a file that contains a list of hosts that are not permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory. 
Include hostsNames a file that contains a list of hosts that are permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory.
Block replicationDefault block replication
dfs.block.local-path-access.userThe user who is allowed to perform short-circuit reads
dfs.datanode.socket.write.timeoutDFS client write socket timeout
dfs.replication.max Maximal block replication
dfs.heartbeat.interval DataNode heartbeat interval in seconds
dfs.safemode.threshold.pct The percentage of blocks that should satisfy the minimal replication requirement set by dfs.replication.min. Values less than or equal to 0 mean not to start in safe mode. Values greater than 1 make safe mode permanent.
dfs.balance.bandwidthPerSec The maximum amount of bandwidth that each DataNode can utilize for balancing purposes in terms of the number of bytes per second
dfs.block.size Default block size for new files
dfs.datanode.ipc.address The DataNode IPC server address and port. If the port is 0 the server starts on a free port.
dfs.blockreport.initialDelay Delay in seconds for first block report
dfs.datanode.du.pct The percentage of real available space to use when calculating remaining space
dfs.namenode.handler.count The number of server threads for the NameNode
dfs.datanode.max.xcievers PRIVATE CONFIG VARIABLE
dfs.umaskmode The octal umask to be used in creating files and directories
dfs.web.ugi The user account used by the web interface. Syntax: USERNAME, GROUP1, GROUP2 ....
dfs.permissions If true, enable permissions checking in HDFS. If false, permission checking is turned off, but all other behavior stays unchanged. Switching from one value to the other does not change the mode, owner, or group of files or directories.
dfs.permissions.supergroup The name of the group of superusers
ipc.server.max.response.size  
dfs.block.access.token.enable If true access tokens are required to access DataNodes. If false access tokens are not checked.
dfs.secondary.https.port The https port where the SecondaryNameNode binds
dfs.https.port The https port where the NameNode binds
dfs.access.time.precision The access time for HDFS file is precise to this value. The default value is 1 hour. A value of 0 disables access times for HDFS.
dfs.cluster.administrators ACL for all who can view the default servlets in HDFS
ipc.server.read.threadpool.size  
io.file.buffer.sizeThe size of bugger for use in sequence files. The size of this bugger should probably be a multiple of hardware page size (4096 on Intel x86). This value determines how much data is buffered during read and write operations.
io.serializations 
io.compression.codec.lzo.classThe implementation class for the LZO codec.
fs.trash.intervalNumber of minutes between trash checkpoints. If zero, the trash feature is disabled.
ipc.client.idlethresholdThe threshold number of connections after which connections are inspected for idleness
ipc.client.connection.maxidletimeMaximum time after which the client brings down the connection to the server
ipc.client.connect.max.retriesMaximum number of retries for IPC connections
webinterface.private.actionsIf true, the native web interfaces for JT and NN may contain actions, such as kill job, delete file, etc. that should not be exposed to the public. Enable this option if these interfaces are reachable only by appropriately authorized users.
Custom Hadoop ConfigsUse this text box to enter values for core-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs.
Custom HDFS ConfigsUse this text box to enter values for hdfs-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs.


loading table of contents...