I recently cleared MapR spark certification and would like to share some tips as I was asked to do so, (here you go my friends)
I divided this blog into 3 sections.
- prerequisite for exam
- exam topics and must cover material
- tips (don't ignore the topics at the end of this blog please)
Prerequisite
First and foremost - Work on Spark and Scala for at least a year before attempting the exam. below points summarize the need.- You should have basic knowledge of distributed functional programming
- Hands-on experience on spark
- Have good exposure to Scala programming(not expecting to be expert but read the code and answer sensibly).
Exam topics and must cover Material
Lots of programming questions in the exam, code snippet is provided and ask solve it and answer. If I remember correctly only 10% questions were theoretical ( like true/false or which algorithm to use kind).I referred lot of materials(online books/videos/ edx courses in last 2 years) for my preparation but if I want to zero-in for what should be the mandatory for MapR certification - here is the list you should not miss any bit and I suggest to go over 4-5 times before taking the exam.
- Instructor and Virtual Instructor-led Training(Training ppt and Lab guide)
- DEV 360 – Developing Spark Applications
- DEV 361 - Build and Monitor Apache Spark Applications
- DEV 362 - Spark Streaming, Spark MLLib - Machine Learning, Graphx
- Book - Learning Spark
- Spark official documentation
- pay more attention to RDD, Closure, Accumulator, Broadcast variables.
- http://spark.apache.org/docs/latest/quick-start.html
- http://spark.apache.org/docs/latest/rdd-programming-guide.html
- MlLib - http://spark.apache.org/docs/latest/ml-guide.html
Topics covered in the exam
Topic Name | Your Score | ||
---|---|---|---|
Load and Inspect Data in Apache Spark | xx% | ||
Advanced Spark Programming and Spark Machine Learning MLLib | xxx% | ||
Monitoring Spark Applications | xx% | ||
Work with Pair RDD | xx.x% | ||
Spark Streaming | xx% | ||
Work with DataFrames | xx% | ||
Build an Apache Spark Application | xxx% |
Tips -
Normally when anyone start preparing for the exam - the good start will be to go through below link
https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/assets/spark-certification-study-guide.pdf
The question on this guide is way to basic comparing to the real exam. the exam was much-much harder.
- Lots of question on core concepts of RDD and pair RDD
- Dataframes are the next important
- About 25% questions on Spark Streaming and Spark MLLib so prepare well on
You don't want to ignore any of the below topics at any cost
Silent topics which you don't want to get as surprise in exam
- Accumulator and Broadcast variables
- Scala Closures
- Narrow and Wide Dependencies
- Partitioning
- Formating questions – saveAsTextFile() – need to save without bracket/parenthesis
- Prepare well for mkString(“,”) and formating
- flatMap functions
- MapPartitions
- There was a question on byKey transformation and also on hadoop streaming which I am not sure about.
Hope this blog will help in your preparation. Please let me know or email me if you have any other questions. Happy Studying
At the end - Here is my certification
very informative blog and useful article thank you for sharing with us , keep posting learn more Big Data Hadoop Online Training Hyderabad
ReplyDeletevery informative blog and useful article thank you for sharing with us spark online training Hyderabad
ReplyDeleteThe article is so appealing. You should read this article before choosing the AWS big data consultant you want to learn.
ReplyDelete