Friday, January 5, 2018

MemSQL - database for real-time analytics


MemSQL


MemSQL is relational, distributed, in memory SQL database platform for real-time analytics.



MemSQL is a proprietary closed-source distributed relational/sql database with both free community and paid enterprise edition

MemSQL enables sub-second refresh of dashboards with real-time data for improved accuracy of positions and risks to drive portfolio growth and customer satisfaction.


MemSQL is a two-tiered architecture consisting of aggregators and leafs.


Aggregators are cluster-aware query routers that act as a gateway into 

the distributed system.The only data they store is cluster metadata. 
The leaf nodes store and compute data .

Overview


  • Distributed SQL database, fully ACID, JSON and Geo Spatial support
  • Rely on main memory for storage.  It can spill to disk or pin data in-memory.
  • Need lot of memory - its a trade off but its cheaper today (every year 40% decrease). Cache is a new RAM, RAM is a disk and disk is
    the new tape. Leverage SSDs (no random write). NVRAM or non-volatile RAM
    is coming sometimes soon.
  • Use MySQL wire protocol.
  • seemless integration with Kafka
  • 10 million upserts per second

Architecture


  • memsqld,
  • aggregators
  • leaf(holds partitions) nodes

Aggregator node stores metadata and leaf node data in partitions 

A partition is an indivisible slice of data that can be replicated and

moved around the cluster.

Agg and Leaf workload


Communication and Ports



Sharding -
insert into tab01 values ('geocode','profileid, .....)
hash("geocode|profileid") % partition  = partition no. (say 8)

Code Example - Spark Aggregates RDBMS and Memsql data

Road Map

RoadMap

MemSQL not for 

  • MemSQL is not designed to be a persistence store (Hbase) or “data lake”
  • MemSQL supports extremely fast, distributed “READ-COMMITTED” transactions,                 but it is not suitable for applications which require “SERIALIZABLE” transactions.

Dependency
<dependency
<artifactId>memsql-connector_2.11</artifactId>
<version>2.0.2</version
</dependency>
/*using memsql connector with Apache spark. In a program,I tried to mix a streaming data coming to memsql and existing database table and join them and put the result for user dashboard. */



package com.valassis.msg.consumer;
import java.io.IOException; import java.util.Properties; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class MemsqlRDBMSConsumer { public static void main(String[] args) throws InterruptedException, IOException { SparkSession spark = SparkSession.builder() .appName("Java Spark MemSQL") .config("spark.memsql.host", "localhost") .config("spark.memsql.port", "3306") .config("spark.memsql.user", "root") .config("spark.memsql.password", "memsql123") .config("spark.memsql.defaultDatabase", "impower_pawan") .getOrCreate(); /* * Dataset<Row> views = spark.read() * .format("com.memsql.spark.connector") .option("query", * "select user, HOST from information_schema.users") .load(); * views.show(); * views.printSchema(); */ Dataset<Row> result = spark .read() .format("com.memsql.spark.connector") .option("query", "select * from impower_pawan.geofootprintmaster") .load(); result.show(); result.printSchema(); result.createOrReplaceTempView("memsql"); String url = "jdbc:oracle:thin:@bruno_bruno1.val.vlss.local:1587:BRUNO1"; String table = "SDR_PM.PA_STORES"; Properties connectionProperties = new Properties(); connectionProperties.put("driver", "oracle.jdbc.driver.OracleDriver"); connectionProperties.put("user", "sdr_pm"); connectionProperties.put("password", "pm007"); Dataset<Row> jdbcDF2 = spark.read().jdbc(url, table, connectionProperties); jdbcDF2.show(); jdbcDF2.createOrReplaceTempView("geo"); Dataset<Row> sqlDF = spark .sql("SELECT b.id, a.geo_profile_id, a.name, b.value FROM geo a, memsql b where b.id = a.geo_profile_id"); sqlDF.show(); } }


1 comment:

  1. Great post. The Spark > MemSQL > UI approach is very interesting -- a great way to keep a UI snappy and updated easily with the power of Spark

    ReplyDelete

Websphere Dummy certificate expired - DummyServerKeyFile.jks , DummyServerTrustFile.jks

If you faced issue with ibm provided dummy certificate expired just like us and looking for the solution.  This blog is for you.  You can re...