Misc
key
|
value
|
example
|
fs.default.name
|
protocol://servername:port
|
hdfs://alpha.milkman.org:9000
|
dfs.data.dir
|
pathname
|
/home/username/hdfs/data
|
dfs.name.dir
|
pathname
|
/home/username/hdfs/name
|
hadoop-site.xml file for a single-node
configuration:
<configuration>
--namenode
url, port and hostname
<property>
<name>fs.default.name</name>
<value>hdfs://your.server.name.com:9000</value>
</property>
--datanode
store the data
<property>
<name>dfs.data.dir</name>
<value>/home/username/hdfs/data</value>
</property>
--namenode
store the metadata
<property>
<name>dfs.name.dir</name>
<value>/home/username/hdfs/name</value>
</property>
</configuration>
we
must format the file system that we just configured:
user@namenode:hadoop$ bin/hadoop namenode -format
This
process should only be performed once. When it is complete, we are free to
start the distributed file system:
user@namenode:hadoop$ bin/start-dfs.sh
This
command will start the NameNode server on the master machine (which is where
the start-dfs.shscript
was invoked). It will also start the DataNode instances on each of the slave
machines. In a single-machine "cluster," this is the same machine as
the NameNode instance. On a real cluster of two or more machines, this script will
ssh into each slave machine and start a DataNode instance.
Command:
|
Assuming:
|
Outcome:
|
bin/hadoop dfs -put foo bar
|
No file/directory
named/user/$USER/bar exists in HDFS
|
Uploads local
file foo to a file named/user/$USER/bar
|
bin/hadoop dfs -put foo bar
|
/user/$USER/bar is a directory
|
Uploads local
file foo to a file named/user/$USER/bar/foo
|
bin/hadoop dfs -put foo somedir/somefile
|
/user/$USER/somedirdoes not exist in HDFS
|
Uploads local
file foo to a file named/user/$USER/somedir/somefile, creating the missing directory
|
bin/hadoop dfs -put foo bar
|
/user/$USER/bar is already a file in HDFS
|
No change in HDFS,
and an error is returned to the user.
|
When the put command
operates on a file, it is all-or-nothing.
Command line reference -> I found this pdf on net. it looks pretty organized so that's
https://images.linoxide.com/hadoop-hdfs-commands-cheatsheet.pdf
Can be used for multiple files
Another synonym for -put is -copyFromLocal. The syntax and functionality are identical.
-get, -cat, -ls,
-du(disk usage), -mv, -cp, rm, -rmr
-moveFromLocallocalSrc dest
|
Copies the file or
directory from the local file system identified by localSrc to dest within
HDFS, then deletes the local copy on success.
|
-getmerge srclocalDest[addnl]
|
Retrieves all files
that match the path src in HDFS, and copies them to a single,
merged file in the local file system identified by localDest.
|
-setrep [-R] [-w]rep path
|
Sets the target
replication factor for files identified by path to rep.
(The actual replication factor will move toward the target over time)
|
-touchz path
|
Creates a file
at path containing the current time as a timestamp. Fails if
a file already exists at path, unless the file is already size 0.
|
-help cmd
|
Returns usage
information for one of the commands listed above. You must omit the leading
'-' character in cmd
|
Shutdown hdfs
someone@namenode:hadoop$ bin/stop-dfs.sh
DFSADMIN COMMAND
State of the namenode metadata
dfsadmin -metasave filename
dfsadmin
-safemode what
safemode based on the
value of what, described below:
·
enter - Enters safemode
·
leave - Forces the NameNode to exit safemode
·
get - Returns a string indicating whether
safemode is ON or OFF
·
wait - Waits until safemode has exited and
returns
Changing
HDFS membership
dfsadmin -refreshNodes
Upgrading
HDFS versions
bin/start-dfs.sh –upgrade
get the status of upgrade
dfsadmin
-upgradeProgress status
dfsadmin
-upgradeProgress details
Rollback to previous version
When HDFS is upgraded, Hadoop retains backup information allowing you to
downgrade to the original HDFS version in case you need to revert Hadoop
versions. To back out the changes, stop the cluster, re-install the older
version of Hadoop, and then use the command: bin/start-dfs.sh -rollback. It will restore the previous HDFS state.
Only one such archival copy can be kept at a time. Thus, after a few
days of operation with the new version (when it is deemed stable), the archival
copy can be removed with the command bin/hadoop dfsadmin -finalizeUpgrade. The rollback command cannot be issued after
this point. This must be performed before a second Hadoop upgrade is allowed.
Help
hadoop
dfsadmin -help cmd
Hive - ^A or \001 as field
separator use
Pig - \t as field separator ---use
PigStorage(‘,’) otherwise
Directory
|
Description
|
Default location
|
Suggested location
|
HADOOP_LOG_DIR
|
Output location for
log files from daemons
|
${HADOOP_HOME}/logs
|
/var/log/hadoop
|
hadoop.tmp.dir
|
A base for other
temporary directories
|
/tmp/hadoop-${user.name}
|
/tmp/hadoop
|
dfs.name.dir
|
Where the NameNode
metadata should be stored
|
${hadoop.tmp.dir}/dfs/name
|
/home/hadoop/dfs/name
|
dfs.data.dir
|
Where DataNodes
store their blocks
|
${hadoop.tmp.dir}/dfs/data
|
/home/hadoop/dfs/data
|
mapred.system.dir
|
The in-HDFS path to
shared MapReduce system files
|
${hadoop.tmp.dir}/mapred/system
|
/hadoop/mapred/sy
|
No comments:
Post a Comment