Sunday, March 25, 2018

MapR Spark Certification tips


I recently cleared MapR spark certification and would like to share some tips as I was asked to do so, (here you go my friends)

I divided this blog into 3 sections. 


  • prerequisite for exam
  • exam topics and must cover material
  • tips (don't ignore the topics at the end of this blog please)



Prerequisite 

First and foremost  -  Work on Spark and Scala for at least a year before attempting the exam. below points summarize the need.

  • You should have basic knowledge of distributed functional programming
  • Hands-on experience on spark 
  • Have good exposure to Scala programming(not expecting to be expert but read the code and answer sensibly).

Exam topics and must cover Material

Lots of programming questions in the exam, code snippet is provided and ask solve it and answer. If I remember correctly only 10% questions were theoretical ( like true/false or which algorithm to use kind).


I referred lot of materials(online books/videos/ edx courses in last 2 years) for my preparation but if I want to zero-in for what should be the mandatory for MapR certification - here is the list you should not miss any bit and I suggest to go over 4-5 times before taking the exam. 
  • Instructor and Virtual Instructor-led Training(Training ppt and Lab guide)
    • DEV 360 – Developing Spark Applications
    • DEV 361 - Build and Monitor Apache Spark Applications
    • DEV 362 - Spark Streaming, Spark MLLib - Machine Learning, Graphx
  • Book - Learning Spark
  • Spark official documentation
    • pay more attention to RDD, Closure, Accumulator, Broadcast variables.  
      • http://spark.apache.org/docs/latest/quick-start.html
      • http://spark.apache.org/docs/latest/rdd-programming-guide.html
    • MlLib - http://spark.apache.org/docs/latest/ml-guide.html



Topics covered in the exam 

Topic NameYour Score
Load and Inspect Data in Apache Spark
                               
xx%
Advanced Spark Programming and Spark Machine Learning MLLib
xxx%
Monitoring Spark Applications
xx%
Work with Pair RDD
                                 
xx.x%
Spark Streaming
xx%
Work with DataFrames
xx%
Build an Apache Spark Application
                                   
xxx%

Tips  - 

Normally when anyone start preparing for the exam - the good start will be to go through below link 

https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/assets/spark-certification-study-guide.pdf

The question on this guide is way to basic comparing to the real exam. the exam was much-much harder.
  • Lots of question on core concepts of RDD and pair RDD
  • Dataframes are the next important
  • About 25% questions on Spark Streaming and Spark MLLib so prepare well on 

You don't want to ignore any of the below topics at any cost

Silent topics which you don't want to get as surprise in exam


  • Accumulator and Broadcast variables
  • Scala Closures
  • Narrow and Wide Dependencies
  • Partitioning  
  • Formating questions – saveAsTextFile() – need to save without bracket/parenthesis
  • Prepare well for mkString(“,”) and formating 
  • flatMap functions
  • MapPartitions
  • There was a question on byKey transformation and also on hadoop streaming which I am not sure about. 
Hope this blog will help in your preparation. Please let me know or email me if you have any other questions. Happy Studying

At the end  - Here is my certification

Sunday, March 18, 2018

Blockchain basics




 Blockchain is a continuously growing list of records which are linked and secured using cryptography.For use as a distributed ledger, a blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority.

Why Block chain

Companies/parties today keep track of records of all transactions between all the parties that the business interacts and update as and when needed. this process is inefficient because 

  • duplication of information with all the parties to update the ledger
  • less transparent 
  • less trusted transactions as data owned by one party and cant be guaranteed to be true.
  • error-prone
The solution is to use distributed, secure, transparent and shared ledger - Block chain
  • A shared ledger technology
  • transparent transaction as all parties are involved/informed
  • immutable chain - only appended

point to note  - Blockchain is still an emerging technology. Business owners need to start small and then look for more ways to grow and expand the use of blockchain networks.


How block chain applied to business network

Assets are classified into tangible(land, properties), intangible(Cash, loan). 

  • The transaction records of assets are kept in the form of distributed ledgers(block chain).
  • Flow of assets/transaction is governed by contract.

This assets can be tracked using this distributed ledger for transparency and maintain only single copy which is shared/endorsed by all party involved hence trusted.

You can see the life cycle of an asset




Blockchain in business 

Blockchain for business provide  - secure, shared ledger which one single record which is accessible for all party involved hence transparent. Business networks prioritize identity over anonymity. Assets are more diverse and important in a business network. A business network gets to choose who validates a transaction.


  • All the member of business network share the common ledger on block chain. 
  • Ledgers are replicated. 
  • all member involved can view the transaction but only authorized members can update the transaction. 



The requirements for a blockchain for business are a shared ledger, smart contract, privacy, and trust.

1. shared ledgers
2. privacy services(who can see what and update the information)
3. Trust - transaction are endorsed by relevant participants 
4.    4. Contract - common/shared business process.



        For example, for financial services network, a business network that runs on a blockchain can speed up transaction processes and audits. That in turn reduces costs and can lead to greater customer satisfaction. A business that runs a supply chain network can benefit from blockchain by reducing errors in shipments, have better tracking or materials, and reduce the risk of illicit tampering of records.


Blockchain for business has several advantages:
  • Saves time
  • Removes cost
  • Reduces risk
  • Increases trust

Use cases of block chain
1. Reference data
2. Supply Chain
3. Trades(Diamond life cycle)


blockchain and bitcoin

Bitcoin is an unregulated shadow-currency and was the first popular blockchain application. The Bitcoin application works in an anonymous network, so no one knows who the participants are.

Bitcoin blockchain is protected by the massive group mining effort. It's unlikely that any private blockchain will try to protect records using gigawatts of computing power — it's time consuming and expensive




Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

If you faced issue with ibm provided dummy certificate expired just like us and looking for the solution.  This blog is for you.  You can re...