MemSQL
MemSQL is relational, distributed, in memory SQL database platform for real-time analytics.
MemSQL is a proprietary closed-source distributed relational/sql database with both free community and paid enterprise edition
MemSQL enables sub-second refresh of dashboards with real-time data for improved accuracy of positions and risks to drive portfolio growth and customer satisfaction.
MemSQL is a two-tiered architecture consisting of aggregators and leafs.
Aggregators are cluster-aware query routers that act as a gateway into
the distributed system.The only data they store is cluster metadata.
The leaf
nodes store and compute data .
Overview
- Distributed SQL database, fully ACID, JSON and Geo Spatial support
- Rely on main memory for storage. It can spill to disk or pin data in-memory.
- Need lot of memory - its a trade off but its cheaper today (every year 40% decrease). Cache is a new RAM, RAM is a disk and disk isthe new tape. Leverage SSDs (no random write). NVRAM or non-volatile RAMis coming sometimes soon.
- Use MySQL wire protocol.
- seemless integration with Kafka
- 10 million upserts per second
Architecture
- memsqld,
- aggregators
- leaf(holds partitions) nodes
Aggregator node stores metadata and leaf node data in partitions
A partition is an indivisible slice of data that can be replicated and
moved around the cluster.
Agg
and Leaf workload
Communication and Ports
Sharding -
insert into tab01
values ('geocode','profileid, .....)
hash("geocode|profileid")
% partition = partition no. (say 8)
Code Example - Spark Aggregates RDBMS and Memsql data
Road Map
Road Map
RoadMap |
MemSQL not for
- MemSQL is not designed to be a persistence store (Hbase) or “data lake”
- MemSQL supports extremely fast, distributed “READ-COMMITTED” transactions, but it is not suitable for applications which require “SERIALIZABLE” transactions.
Dependency
<dependency>
<artifactId>memsql-connector_2.11</artifactId>
<version>2.0.2</version>
</dependency>
/*using memsql connector with Apache spark. In a program,I tried to mix a streaming data coming to memsql and existing database table and join them and put the result for user dashboard. */
package com.valassis.msg.consumer;
import java.io.IOException;
import java.util.Properties;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class MemsqlRDBMSConsumer {
public static void main(String[] args) throws
InterruptedException,
IOException {
SparkSession spark = SparkSession.builder()
.appName("Java Spark MemSQL")
.config("spark.memsql.host",
"localhost")
.config("spark.memsql.port",
"3306")
.config("spark.memsql.user",
"root")
.config("spark.memsql.password",
"memsql123")
.config("spark.memsql.defaultDatabase",
"impower_pawan")
.getOrCreate();
/*
* Dataset<Row> views = spark.read()
*
.format("com.memsql.spark.connector") .option("query",
* "select user, HOST from
information_schema.users") .load();
* views.show();
*
* views.printSchema();
*/
Dataset<Row> result = spark
.read()
.format("com.memsql.spark.connector")
.option("query",
"select * from
impower_pawan.geofootprintmaster")
.load();
result.show();
result.printSchema();
result.createOrReplaceTempView("memsql");
String url =
"jdbc:oracle:thin:@bruno_bruno1.val.vlss.local:1587:BRUNO1";
String table = "SDR_PM.PA_STORES";
Properties connectionProperties = new
Properties();
connectionProperties.put("driver",
"oracle.jdbc.driver.OracleDriver");
connectionProperties.put("user",
"sdr_pm");
connectionProperties.put("password",
"pm007");
Dataset<Row> jdbcDF2 =
spark.read().jdbc(url, table,
connectionProperties);
jdbcDF2.show();
jdbcDF2.createOrReplaceTempView("geo");
Dataset<Row> sqlDF = spark
.sql("SELECT b.id, a.geo_profile_id,
a.name, b.value FROM geo a, memsql b where b.id = a.geo_profile_id");
sqlDF.show();
}
}
Great post. The Spark > MemSQL > UI approach is very interesting -- a great way to keep a UI snappy and updated easily with the power of Spark
ReplyDelete