Monday, September 7, 2015

HDFS Configurations and Commands notes



hadoop-site.xml file for a single-node configuration:
--namenode url, port and hostname
--datanode store the data


--namenode store the metadata



we must format the file system that we just configured:
  user@namenode:hadoop$ bin/hadoop namenode -format
This process should only be performed once. When it is complete, we are free to start the distributed file system:
  user@namenode:hadoop$ bin/
This command will start the NameNode server on the master machine (which is where the start-dfs.shscript was invoked). It will also start the DataNode instances on each of the slave machines. In a single-machine "cluster," this is the same machine as the NameNode instance. On a real cluster of two or more machines, this script will ssh into each slave machine and start a DataNode instance.

bin/hadoop dfs -put foo bar
No file/directory named/user/$USER/bar exists in HDFS
Uploads local file foo to a file named/user/$USER/bar
bin/hadoop dfs -put foo bar
/user/$USER/bar is a directory
Uploads local file foo to a file named/user/$USER/bar/foo
bin/hadoop dfs -put foo somedir/somefile
/user/$USER/somedirdoes not exist in HDFS
Uploads local file foo to a file named/user/$USER/somedir/somefile, creating the missing directory
bin/hadoop dfs -put foo bar
/user/$USER/bar is already a file in HDFS
No change in HDFS, and an error is returned to the user.
When the put command operates on a file, it is all-or-nothing.

Command line reference -> I found this pdf on net. it looks pretty organized so that's

Can be used for multiple files

Another synonym for -put is -copyFromLocal. The syntax and functionality are identical.
-get, -cat, -ls, -du(disk usage), -mv, -cp, rm, -rmr
-moveFromLocallocalSrc dest
Copies the file or directory from the local file system identified by localSrc to dest within HDFS, then deletes the local copy on success.
-getmerge srclocalDest[addnl]
Retrieves all files that match the path src in HDFS, and copies them to a single, merged file in the local file system identified by localDest.

-setrep [-R] [-w]rep path
Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target over time)

-touchz path
Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.
-help cmd
Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd

Shutdown hdfs
someone@namenode:hadoop$ bin/

State of the namenode metadata
dfsadmin -metasave filename
 dfsadmin -safemode what
safemode based on the value of what, described below:
·         enter - Enters safemode
·         leave - Forces the NameNode to exit safemode
·         get - Returns a string indicating whether safemode is ON or OFF
·         wait - Waits until safemode has exited and returns

Changing HDFS membership
dfsadmin  -refreshNodes
Upgrading HDFS versions
bin/ –upgrade
get the status of upgrade
dfsadmin -upgradeProgress status
dfsadmin -upgradeProgress details
Rollback to previous version
When HDFS is upgraded, Hadoop retains backup information allowing you to downgrade to the original HDFS version in case you need to revert Hadoop versions. To back out the changes, stop the cluster, re-install the older version of Hadoop, and then use the command: bin/ -rollback. It will restore the previous HDFS state.
Only one such archival copy can be kept at a time. Thus, after a few days of operation with the new version (when it is deemed stable), the archival copy can be removed with the command bin/hadoop dfsadmin -finalizeUpgrade. The rollback command cannot be issued after this point. This must be performed before a second Hadoop upgrade is allowed.

hadoop dfsadmin -help cmd

Hive - ^A or \001 as field separator use
Pig - \t as field separator ---use PigStorage(‘,’) otherwise 
Default location
Suggested location
Output location for log files from daemons
A base for other temporary directories
Where the NameNode metadata should be stored
Where DataNodes store their blocks
The in-HDFS path to shared MapReduce system files

No comments:

Post a Comment

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

If you faced issue with ibm provided dummy certificate expired just like us and looking for the solution.  This blog is for you.  You can re...