Monday, September 7, 2015

HDFS Configurations and Commands notes

Misc

key
value
example
fs.default.name
protocol://servername:port
hdfs://alpha.milkman.org:9000
dfs.data.dir
pathname
/home/username/hdfs/data
dfs.name.dir
pathname
/home/username/hdfs/name

hadoop-site.xml file for a single-node configuration:
<configuration>
--namenode url, port and hostname
  <property>
    <name>fs.default.name</name>
    <value>hdfs://your.server.name.com:9000</value>
  </property>
--datanode store the data

  <property>
    <name>dfs.data.dir</name>

    <value>/home/username/hdfs/data</value>
  </property>
--namenode store the metadata

  <property>
    <name>dfs.name.dir</name>

    <value>/home/username/hdfs/name</value>
  </property>
</configuration>

we must format the file system that we just configured:
  user@namenode:hadoop$ bin/hadoop namenode -format
This process should only be performed once. When it is complete, we are free to start the distributed file system:
  user@namenode:hadoop$ bin/start-dfs.sh
This command will start the NameNode server on the master machine (which is where the start-dfs.shscript was invoked). It will also start the DataNode instances on each of the slave machines. In a single-machine "cluster," this is the same machine as the NameNode instance. On a real cluster of two or more machines, this script will ssh into each slave machine and start a DataNode instance.

Command:
Assuming:
Outcome:
bin/hadoop dfs -put foo bar
No file/directory named/user/$USER/bar exists in HDFS
Uploads local file foo to a file named/user/$USER/bar
bin/hadoop dfs -put foo bar
/user/$USER/bar is a directory
Uploads local file foo to a file named/user/$USER/bar/foo
bin/hadoop dfs -put foo somedir/somefile
/user/$USER/somedirdoes not exist in HDFS
Uploads local file foo to a file named/user/$USER/somedir/somefile, creating the missing directory
bin/hadoop dfs -put foo bar
/user/$USER/bar is already a file in HDFS
No change in HDFS, and an error is returned to the user.
When the put command operates on a file, it is all-or-nothing.


Command line reference -> I found this pdf on net. it looks pretty organized so that's

https://images.linoxide.com/hadoop-hdfs-commands-cheatsheet.pdf


Can be used for multiple files

Another synonym for -put is -copyFromLocal. The syntax and functionality are identical.
-get, -cat, -ls, -du(disk usage), -mv, -cp, rm, -rmr
-moveFromLocallocalSrc dest
Copies the file or directory from the local file system identified by localSrc to dest within HDFS, then deletes the local copy on success.
-getmerge srclocalDest[addnl]
Retrieves all files that match the path src in HDFS, and copies them to a single, merged file in the local file system identified by localDest.

-setrep [-R] [-w]rep path
Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target over time)

-touchz path
Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.
-help cmd
Returns usage information for one of the commands listed above. You must omit the leading '-' character in cmd


Shutdown hdfs
someone@namenode:hadoop$ bin/stop-dfs.sh

DFSADMIN COMMAND
State of the namenode metadata
dfsadmin -metasave filename
 dfsadmin -safemode what
safemode based on the value of what, described below:
·         enter - Enters safemode
·         leave - Forces the NameNode to exit safemode
·         get - Returns a string indicating whether safemode is ON or OFF
·         wait - Waits until safemode has exited and returns


Changing HDFS membership
dfsadmin  -refreshNodes
Upgrading HDFS versions
bin/start-dfs.sh –upgrade
get the status of upgrade
dfsadmin -upgradeProgress status
dfsadmin -upgradeProgress details
Rollback to previous version
When HDFS is upgraded, Hadoop retains backup information allowing you to downgrade to the original HDFS version in case you need to revert Hadoop versions. To back out the changes, stop the cluster, re-install the older version of Hadoop, and then use the command: bin/start-dfs.sh -rollback. It will restore the previous HDFS state.
Only one such archival copy can be kept at a time. Thus, after a few days of operation with the new version (when it is deemed stable), the archival copy can be removed with the command bin/hadoop dfsadmin -finalizeUpgrade. The rollback command cannot be issued after this point. This must be performed before a second Hadoop upgrade is allowed.

Help    
hadoop dfsadmin -help cmd

Hive - ^A or \001 as field separator use
Pig - \t as field separator ---use PigStorage(‘,’) otherwise 
Directory
Description
Default location
Suggested location
HADOOP_LOG_DIR
Output location for log files from daemons
${HADOOP_HOME}/logs
/var/log/hadoop
hadoop.tmp.dir
A base for other temporary directories
/tmp/hadoop-${user.name}
/tmp/hadoop
dfs.name.dir
Where the NameNode metadata should be stored
${hadoop.tmp.dir}/dfs/name
/home/hadoop/dfs/name
dfs.data.dir
Where DataNodes store their blocks
${hadoop.tmp.dir}/dfs/data
/home/hadoop/dfs/data
mapred.system.dir
The in-HDFS path to shared MapReduce system files
${hadoop.tmp.dir}/mapred/system
/hadoop/mapred/sy



No comments:

Post a Comment

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

If you faced issue with ibm provided dummy certificate expired just like us and looking for the solution.  This blog is for you.  You can re...