This tab covers HDFS settings. Here you can set properties for the NameNode, Secondary NameMode, DataNodes, and some general and advanced properties. Click the name of the group to expand and collapse the display.
Table 3.1. HDFS Settings:NameNode
Name | Notes |
---|---|
NameNode host | This value is prepopulated based on your choices on previous screens. The host that has been assigned to run NameNode. |
NameNode directories | NameNode directories for HDFS to store the file system image. |
NameNode Java heap size | Initial and maximum Java heap size for NameNode (Java options -Xms and -Xmx) |
NameNode new generation size | Default size of Java new generation for NameNode (Java option -XX:NewSize) Note: The value of NameNode new generation size (default size of Java new generation for NameNode (Java option -XX:NewSize)) should be 1/8 of maximum heap size (-Xmx). Ensure that the value of the namenode_opt_newsize property is 1/8 the value of maximum heap size (-Xmx). |
Table 3.2. HDFS Settings:SNameNode
Name | Notes |
---|---|
SNameNode host | This value is prepopulated based on your choices on previous screens. The host that has been assigned to run the SecondaryNameNode. |
Secondary NameNode Checkpoint Directory | Directory on the local filesystem where the SecondaryNameNode should store the temporary images to merge |
Table 3.3. HDFS Settings:DataNodes
Name | Notes |
---|---|
DataNode hosts | The hosts that have been assigned to run DataNode |
DataNode directories | DataNode directories for HDFS to store the data blocks |
DataNode maximum Java heap size | Maximum Java heap size for DataNode (Java option -Xmx) |
DataNode volumes failure toleration | The number of volumes that are allowed to fail before a DataNode stops offering services. |
Table 3.4. HDFS Settings:General
Name | Notes |
---|---|
WebHDFS enabled | Check to enable WebHDFS |
Hadoop maximum Java heap size | Maximum Java heap size for daemons such as Balancer (Java option -Xmx) [a] |
Reserved space for HDFS | Reserved space in GB per volume for HDFS |
HDFS Maximum Checkpoint Delay | Maximum delay between two consecutive checkpoints for HDFS |
HDFS Maximum Edit Log Size for Checkpointing | Maximum size of the edits log file that forces an urgent checkpoint even if the maximum checkpoint delay is not reached |
[a] The default value for this property is 1 GB. This value may need to be reduced for a VM-based installation. On the other hand, for significant work using Hive Server, 2GB is a more realistic value. |
Table 3.5. HDFS Settings:Advanced
Name | Notes |
---|---|
Hadoop Log Dir Prefix | The parent directory for Hadoop log files. The HDFS log directory will be ${hadoop_log_dir_prefix}/${hdfs_user} and the MapReduce log directory will be ${hadoop_log_dir_prefix}/${mapred_user} |
Hadoop PID Dir Prefix | The parent directory in which the PID files for Hadoop processes will be created. The HDFS PID directory will be ${hadoop_pid_dir_prefix}/${hdfs_user} and the MapReduce PID directory will be ${hadoop_pid_dir_prefix}/${mapred_user} |
Exclude hosts | Names a file that contains a list of hosts that are not permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory. |
Include hosts | Names a file that contains a list of hosts that are permitted to connect to the NameNode. This file will be placed inside the Hadoop conf directory. |
Block replication | Default block replication |
dfs.block.local-path-access.user | The user who is allowed to perform short-circuit reads |
dfs.datanode.socket.write.timeout | DFS client write socket timeout |
dfs.replication.max | Maximal block replication |
dfs.heartbeat.interval | DataNode heartbeat interval in seconds |
dfs.safemode.threshold.pct | The percentage of blocks that should satisfy the minimal replication requirement set by dfs.replication.min. Values less than or equal to 0 mean not to start in safe mode. Values greater than 1 make safe mode permanent. |
dfs.balance.bandwidthPerSec | The maximum amount of bandwidth that each DataNode can utilize for balancing purposes in terms of the number of bytes per second |
dfs.block.size | Default block size for new files |
dfs.datanode.ipc.address | The DataNode IPC server address and port. If the port is 0 the server starts on a free port. |
dfs.blockreport.initialDelay | Delay for first block report in seconds |
dfs.datanode.du.pct | The percentage of real available space to use when calculating remaining space |
dfs.namenode.handler.count | The number of server threads for the NameNode |
dfs.datanode.max.xcievers | PRIVATE CONFIG VARIABLE |
dfs.umaskmode | The octal umask used in creating files and directories |
dfs.web.ugi | The user account used by the web interface. Syntax: USERNAME, GROUP1, GROUP2 .... |
dfs.permissions | If true , enable permissions checking in HDFS. If false , permission
checking is turned off, but all other behavior stays unchanged. Switching from one value to the
other does not change the mode, owner, or group of files or directories.
|
dfs.permissions.supergroup | The name of the group of superusers |
ipc.server.max.response.size | |
dfs.block.access.token.enable | If true access tokens are required to access DataNodes. If
false access tokens are not checked.
|
dfs.secondary.https.port | The https port where the SecondaryNameNode binds |
dfs.https.port | The https port where the NameNode binds |
dfs.access.time.precision | The access time for HDFS file is precise to this value. The default value is 1 hour. A value of 0 disables access times for HDFS. |
dfs.cluster.administrators | ACL for all who can view the default servlets in HDFS |
ipc.server.read.threadpool.size | |
io.file.buffer.size | The size of bugger for use in sequence files. The size of this bugger should probably be a multiple of hardware page size (4096 on Intel x86). This value determines how much data is buffered during read and write operations. |
io.serializations | |
io.compression.codec.lzo.class | The implementation class for the LZO codec. |
fs.trash.interval | Number of minutes between trash checkpoints. If zero, the trash feature is disabled. |
ipc.client.idlethreshold | The threshold number of connections after which connections are inspected for idleness |
ipc.client.connection.maxidletime | Maximum time after which the client brings down the connection to the server |
ipc.client.connect.max.retries | Maximum number of retries for IPC connections |
webinterface.private.actions | If true, the native web interfaces for JT and NN may contain actions, such as kill job, delete file, etc. that should not be exposed to the public. Enable this option if these interfaces are reachable only by appropriately authorized users. |
Custom Hadoop Configs | Use this text box to enter values for core-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs. |
Custom HDFS Configs | Use this text box to enter values for hdfs-site.xml properties not exposed by the UI. Enter in "key=value" format, with a newline as a delimiter between pairs. |