Sunday, August 13, 2017

Must Know Technologies for Hadoop developers


Having knowledge on Java is an essential skill to program in Hadoop - Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Based on Google’s MapReduce model, Hadoop distributes computing jobs and then combines results. The MapReduce scripts used here are written in Java.

you should also be comfortable with below Hadoop frameworks tools. 




Data injection Storage and It's API - File system/NoSQL DB
     Analytical Processing 
Programing SQL on Hadoop
Apache Flume
Apache Sqoop
Apache Kafka
Apache NiFi
Apache HDFS
Apache HBase(column family)
MongoDB(Document)
Redis DataBase(Key-Value)
Apache MapReduce
Apache Pig
Apache Spark
Apache Tez
Apache Hive
Apache Drill
Cloudera Impala
Apache Phoenix
Kylin




You should also have good understanding of  Service Programming frameworks (Apache Thrift, Apache Zookeeper), Serialization tools(Apache Avro) , Scheduling and Workflow tools (Apache Oozie), Security framework (Apache Sentry) and System deployment tools like Apache Ambari, Cloudera HUE, MAPR Admin UI





No comments:

Post a Comment

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

If you faced issue with ibm provided dummy certificate expired just like us and looking for the solution.  This blog is for you.  You can re...