Having knowledge on Java is an essential skill to program in Hadoop - Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Based on Google’s MapReduce model, Hadoop distributes computing jobs and then combines results. The MapReduce scripts used here are written in Java.
you should also be comfortable with below Hadoop frameworks tools.Data injection | Storage and It's API -
File system/NoSQL DB |
Analytical Processing | |
Programing | SQL on Hadoop | ||
Apache Flume Apache Sqoop Apache Kafka Apache NiFi |
Apache
HDFS Apache HBase(column family) MongoDB(Document) Redis DataBase(Key-Value) |
Apache
MapReduce Apache Pig Apache Spark Apache Tez |
Apache
Hive Apache Drill Cloudera Impala Apache Phoenix Kylin |
You should also have good understanding of Service Programming frameworks (Apache Thrift, Apache Zookeeper), Serialization tools(Apache Avro) , Scheduling and Workflow tools (Apache Oozie), Security framework (Apache Sentry) and System deployment tools like Apache Ambari, Cloudera HUE, MAPR Admin UI
No comments:
Post a Comment