My Notes: HDFS Configurations and Commands notes

Misc

key	value	example
fs.default.name	protocol://servername:port	hdfs://alpha.milkman.org:9000
dfs.data.dir	pathname	/home/username/hdfs/data
dfs.name.dir	pathname	/home/username/hdfs/name

hadoop-site.xml file for a single-node configuration:

--namenode url, port and hostname

<name>fs.default.name</name>

<value>hdfs://your.server.name.com:9000</value>

</property>

--datanode store the data

<value>/home/username/hdfs/data</value>

</property>

--namenode store the metadata

<value>/home/username/hdfs/name</value>

</property>

</configuration>

we must format the file system that we just configured:

  user@namenode:hadoop$ bin/hadoop namenode -format

This process should only be performed once. When it is complete, we are free to start the distributed file system:

  user@namenode:hadoop$ bin/start-dfs.sh

This command will start the NameNode server on the master machine (which is where the start-dfs.shscript was invoked). It will also start the DataNode instances on each of the slave machines. In a single-machine "cluster," this is the same machine as the NameNode instance. On a real cluster of two or more machines, this script will ssh into each slave machine and start a DataNode instance.

Command:	Assuming:	Outcome:
bin/hadoop dfs -put foo bar	No file/directory named/user/$USER/bar exists in HDFS	Uploads local file foo to a file named/user/$USER/bar
bin/hadoop dfs -put foo bar	/user/$USER/bar is a directory	Uploads local file foo to a file named/user/$USER/bar/foo
bin/hadoop dfs -put foo somedir/somefile	/user/$USER/somedirdoes not exist in HDFS	Uploads local file foo to a file named/user/$USER/somedir/somefile, creating the missing directory
bin/hadoop dfs -put foo bar	/user/$USER/bar is already a file in HDFS	No change in HDFS, and an error is returned to the user.

When the put command operates on a file, it is all-or-nothing.

Command line reference -> I found this pdf on net. it looks pretty organized so that's

https://images.linoxide.com/hadoop-hdfs-commands-cheatsheet.pdf

Can be used for multiple files

Another synonym for -put is -copyFromLocal. The syntax and functionality are identical.

-get, -cat, -ls, -du(disk usage), -mv, -cp, rm, -rmr

-moveFromLocallocalSrc dest	Copies the file or directory from the local file system identified by localSrc to dest within HDFS, then deletes the local copy on success.
-getmerge srclocalDest[addnl]	Retrieves all files that match the path src in HDFS, and copies them to a single, merged file in the local file system identified by localDest.

-setrep [-R] [-w]rep path

Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target over time)

-touchz path	Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.
-help cmd	Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd

Shutdown hdfs

someone@namenode:hadoop$ bin/stop-dfs.sh

DFSADMIN COMMAND

State of the namenode metadata

dfsadmin -metasave filename

dfsadmin -safemode what

safemode based on the value of what, described below:

· enter - Enters safemode

· leave - Forces the NameNode to exit safemode

· get - Returns a string indicating whether safemode is ON or OFF

· wait - Waits until safemode has exited and returns

Changing HDFS membership

dfsadmin -refreshNodes

Upgrading HDFS versions

bin/start-dfs.sh –upgrade

get the status of upgrade

dfsadmin -upgradeProgress status

dfsadmin -upgradeProgress details

Rollback to previous version

When HDFS is upgraded, Hadoop retains backup information allowing you to downgrade to the original HDFS version in case you need to revert Hadoop versions. To back out the changes, stop the cluster, re-install the older version of Hadoop, and then use the command: bin/start-dfs.sh -rollback. It will restore the previous HDFS state.

Only one such archival copy can be kept at a time. Thus, after a few days of operation with the new version (when it is deemed stable), the archival copy can be removed with the command bin/hadoop dfsadmin -finalizeUpgrade. The rollback command cannot be issued after this point. This must be performed before a second Hadoop upgrade is allowed.

Help

hadoop dfsadmin -help cmd

Hive - ^A or \001 as field separator use

Pig - \t as field separator ---use PigStorage(‘,’) otherwise

Directory	Description	Default location	Suggested location
HADOOP_LOG_DIR	Output location for log files from daemons	${HADOOP_HOME}/logs	/var/log/hadoop
hadoop.tmp.dir	A base for other temporary directories	/tmp/hadoop-${user.name}	/tmp/hadoop
dfs.name.dir	Where the NameNode metadata should be stored	${hadoop.tmp.dir}/dfs/name	/home/hadoop/dfs/name
dfs.data.dir	Where DataNodes store their blocks	${hadoop.tmp.dir}/dfs/data	/home/hadoop/dfs/data
mapred.system.dir	The in-HDFS path to shared MapReduce system files	${hadoop.tmp.dir}/mapred/system	/hadoop/mapred/sy

My Notes

Monday, September 7, 2015

HDFS Configurations and Commands notes

No comments:

Post a Comment

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

Search This Blog