Thursday, December 10, 2020

unix commands cheat sheet

Vi Usage

Note - you need to be in command mode to run these commands(Press Esc key to go to command mode)

Below command will find PM_LINK_BK and replace it with PM_LINK

%s/PM_LINK_BK/PM_LINK/g

Below command will find PM_LINK

%/PM_LINK

Press letter n to repeat the search to go to next result.

Below command will undo the last change

Just press letter u

Below command will undo all changes made in the current line

Just press letter U

Similarly x, d are for delete, h j k l are for moving forward or up and down.

kill all the process of user - username

ps U <username> | cut -d " " -f 1 | xargs kill

use this command to order the largest files for validation

ls -l -S | sort -k 5 -n | awk ' { printf("%10d %s\n",$5,$9) } '

simply run

ls -lS

in the folder - it will sort the files based on size

get unique first field delimited by “,” in the file load_dig_in_market_digtatz.bad
(similar to something like select distinct f1 from tablename)

cut -d, -f1 load_dig_in_market_digtatz.bad | uniq

Clear history

cat /dev/null > ~/.bash_history && history -c && exit -- clean history

Find number of CPUs

grep -c ^processor /proc/cpuinfo -- number of cpu

Find memory

free -m ----- memory

file - /proc/meminfo

cat /proc/meminfo

Find OS version (multiple command for different linux distributions)

cat /etc/os-release

lsb_release –a

hostnamectl

Find command

find in cac.properties file starting with current dir with the word valvcsspi001vm

find . -name cac.properties 2> /dev/null -exec grep valvcsspi001vm {} \;

TimeOut command

Times out the command if its not success in specified seconds

When times out – it return 124

timeout 10 <command>

scp command

copy files from one server to another

scp <localfile> user@remotehost:/<folder-to-be-copied>/

scp -i /home/mapr/.ssh/id_rsa /dfs/impower_dig/vlh/vlh_00014-8a9ccab8-ac1f-4776-8c58-8eb9149da9bd.c000_20190610.snappy.parquet valmarketsvc@sa1x-hadoop-p1.hchc.local:/mapr/mapr-clp1.hchc.local/valassis_data/impower_dig/vlh/.

command to run sh script from any server to <serverB> using user <username>

ssh -l <username> <serverB> /tda/data/epsilon/scripts/getFile.sh

A good example of timeout and scp

timeout 100 scp -i /home/mapr/.ssh/id_rsa /dfs/impower_dig/vlh/vlh_00014-8a9ccab8-ac1f-4776-8c58-8eb9149da9bd.c000_20190610.snappy.parquet valmarketsvc@sa1x-hadoop-p1.hchc.local:/mapr/mapr-clp1.hchc.local/valassis_data/impower_dig/vlh/.

And check the status of command.

Awk Usage

Extract 6th field from xref_0002.txt file , grab the txt containing SO , distinct and count.

(similar to something like select count(distinct f6) from tablename

where f6 like ‘%DO%’)

awk -F, '{print $6}' xref_0002.txt | grep -i SO | uniq | wc -l

just grep ooxie.distro process

[mapr@valvcshad002vm ~]$ ps -eo comm,pid,etimes | awk '/oozie.distro/ {print $3, $1,$2}'

2 oozie.distro 57876

Kill all java processes

ps ax | grep java | grep -v 'grep' | cut -d '?' -f1 | xargs kill -9

Thursday, December 3, 2020

sqlalchemy : configuring database client for python (pycharm)

The cx_Oracle module loads Oracle Client libraries which communicate over Oracle Net to an existing database. Oracle Net is not a separate product: it is how the Oracle Client and Oracle Database communicate.

Below steps for configuring database client for python (pycharm) on Mac OS. you may need to tweak it for any other OS installation

Step 1: configure instantclient-basiclite-macos.x64-19.8.0.0.0dbru.zip
	a. download https://www.oracle.com/database/technologies/instant-client/macos-intel-x86-downloads.html
	b. unzip to the local directory

Step 1a: configure it for window

	a. download https://www.oracle.com/database/technologies/instant-client/winx64-64-downloads.html

	b. unzip to the local directory

c. remember the path as you need to pass it in python programStep 2: install sqlalchemy and cx_Oracle
a. in pycharm - go to preferences -> project Interpreter -> install sqlalchemy and cx_Oracle
b. update the path of instantclient_19_3 in below
cx_Oracle.init_oracle_client(lib_dir="<local path>/instantclient_19_3") in TestDB.py
c. run

Linux Installation

You may need to install using root in case of issues

pip3 install cx_Oracle
pip3 install sqlalchemy
sudo dnf install oracle-release-el8
sudo dnf install oracle-instantclient19.10-basic
sudo dnf install oracle-instantclient19.10-basic --allowerasing
path -> /usr/lib/oracle/19.10/client64/lib/ 

Code change - TestDB.py

cx_Oracle.init_oracle_client(lib_dir="/usr/lib/oracle/19.10/client64/lib/")

Program: TestDB.pyimport sqlalchemy
import cx_Oracle
cx_Oracle.init_oracle_client(lib_dir="/Users/agnihotrip/instantclient_19_3")
user = "user"
password = "pass"
host = "<hostname>"
port = "1521"
service = "< service_name>"

connection_string = f'oracle+cx_oracle://{user}:{password}@{host}/?service_name={service}'
connection = sqlalchemy.create_engine(connection_string).connect()
result = connection.execute('select * from proc.proc_dnd_stage')
for row in result:
    print (row)	
connection.close()

Semi Supervised & Feature Engineering

Semi Supervised

There are so many algorithms available that it can feel overwhelming when algorithm names are thrown around and you are expected to just know what they are and where they fit.

I explained supervised and non-supervised learning in my previous blog so I dont go into details explaining them but will focus on Semi-Supervised learning and feature engineering.

semi-supervised learning is learning from model with labeled and unlabeled data. semi-supervised learning methods are used in areas such as image classification where there are large datasets with very few labeled examples

Learning Style algorithms -

supervised algorithm

Example problems are classification and regression.

Example algorithms include Logistic Regression and the Back Propagation Neural Network.

unsupervised algorithm

Example problems are clustering, dimensional reduction and association rule learning.

Example algorithms include: the Apriori algorithm and k-Means.

Feature Engineering

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work.

Features are the ways you represent about the world for the classifier. feature selection has a multiplicative effect on the overall modeling process

Feature are numeric or categorical. feature engineering techniques are used to define feature more accuracy for your model.

Bucketing
Crossing
Hashing
Embedding

Feature Bucketing - transform the numeric feature into categorical feature.

problem - age increase so as income has a linear relation?

Age is not in linear relationship with age as children under 17 year didnt earn much so as after retirement.

Solution - Bucket the age(numeric feature) into age groups (categorical features) and put different weight for each age group. this is how we create age bucket.

Feature Crossing - way to create a new feature that are combination of existing features.

problem - Can linear classifier model interaction between multiple features say age and education against income?

No. This is were feature crossing is useful. for each cross(age bucket,education)-> we create new true/false feature and age bucket is divided into true/false of income with education.

Feature Hashing or hash buckets

one way to represent category feature with large vocabulary.

This representation can save memory and faster to execute.

A categorical feature with a large number of values can be represented and vocabulary not specified in advance.

To avoid collision put the hashing bucket number more than the unique occupation.

It can also be used to limit the number of possibilities.

Embedding - it represent the meaning of the words as a vector.

Used for large vocabulary

Embeddings are dense.

1. The K-Nearest Neighbors algorithm (K-NN or KNN) is a supervised learning method used

2. for classification and regression.

3. * For classification, the output of the K-NN algorithm is the classification of an unknown

4. data point based on the k 'nearest' neighbors in the training data.

5. * For regression, the output is an average of the values of a target variable based on

6. the k 'nearest' neighbors in the training data.

7. A very high value of K (ex. K = 100) produces an overly generalised model, while a very low value of k (ex. k = 1) produces a highly complex model.

A difficulty that arises from trying to classify out-of-sample data is that the actual classification may not be known, therefore making it hard to produce an accurate result.

2. The sum of the weights must be equal to 1.

Model Evaluation: Overfitting & Underfitting
Bias

Bias is the error that results from incorrect assumptions and relations that the model makes
High Bias Caused overly Generalized model cause underfitting

Variance

Variance is the inconsistency of a model due to small changes in the dataset.
Variance is the expected value of the squared deviation of a random variable from its mean.
high variance - model changes drastically due to minor modificatio. this is over fiting. to much dependence on data and model

A good balance is keeping the model general enough for out-of-sample data but specific enough to fit the pattern of the data.

Metrics –

Error is the difference between the data point and the trend line generated by algorithm

There are three main model evaluation metrics we'll look at they are:

1. * Mean Absolute Error (MAE),

2. * Mean Squared Error (MSE), and

3. * Root Mean Squared Error (RMSE). (mostly used)

Unsupervised Learning -

K-Means Clustering plus Advantages & Disadvantages

it can group unknown data through the use of algorithms. Grouping datapoints together by using a centroid and distances from the centroid to other points.

Euclidean distance is used to measure the distance from the object to the centroid.

Advantage - easy to understand and fast.

Disadvantage - high variation of clustering model. possibility for centroid not having datapoint so not being updated.

Hierarchical Clustering plus Advantages & Disadvantages

Demdrogram and proximity matrix( distance from each point to other points)

Measuring the Distances Between Clusters - Single Linkage Clustering
Measuring the Distances Between Clusters - Algorithms for Hierarchy Clustering
Density-Based Clustering - DB Scan

There are two parameters that are taken into account, epsilon (and minimum points Epsilon is the maximum radius of the neighborhood and minimum points is the minimum number of points in the epsilon-neighborhood to define a cluster. There are three classifications of points They are Core, Border, and Outlier.

DBScan is used to remove the outliers and can predict the cluster accurately.

K-mean cant distinguish between noise and clusters.

Dimensionality Reduction can be divided into two subcategories

Feature Selection which includes Wrappers, Filters, and Embedded.
Feature Extraction which includes Principle Components Analysis.

Feature Selection is the process of selecting a subset of relevant features or variables.

Wrapper - Wrappers use a predictive model that scores feature subsets based on the error-rate of the model. Wrappers are computationally expensive but provide best selection.

A popular technique is called stepwise regression.

Filters- Feature set is more general than wrapper. Filters use a proxy measure which is less computationally intensive but slightly less accurate.

An interesting fact about filters is that they produce a feature set that don't contain assumptions based on the predictive model, making it a useful tool for exposing relationships between features, such as which variables are 'Bad' together and, as a result, drop the accuracy or 'Good' together and therefore raise the accuracy.

Embedded-

Principle Components Analysis is the reduction of higher vector spaces to lower orders through projection. It can be used to visualize the dataset through compact representation and compression of dimensions.

An easy representation of this would be the projection from a 3-dimensional plane to a 2-dimensional one. A plane is first found which captures most (if not all) of the information. Then the data is projected onto new axes and a reduction in dimensions occur. When the projection of components happens, new axes are created to describe the relationship. This is called the principle axes, and the new data is called principle components.

Recommondation - Collaborative Filtering

Collaborative Filtering techniques explore the idea that relationships exists between products and people's interests.

As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest-neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.

One popular example of Collaborative Filtering is Netflix. Everything on their site is driven by their customer's selections, which if made frequently enough, get turned into recommendations. Netflix orders these recommendations in such a way that the highest ranking items are more visible to users, in hopes of getting them to select those recommendations as well

Another popular example is amazon.com Amazon's item recommendation system is based on what you've previously purchased, as well as the frequency with which you've looked at certain books or other items during previous visits to their website. The advantages of using Collaborative Filtering is that users get a broader exposure to many different products they might be interested in. This exposure encourages users towards continual usage or purchase of their product.

Challenges -

1. One of them is Data Sparsity. Having a Large Dataset will most likely result in a user-item matrix being large and sparse, which may provide a good level of accuracy but also pose a risk to speed In comparison, having a small dataset would result in faster speeds but lower accuracy.

2. Cold Start

Another issue to keep in mind is something called 'cold start'. This is where new users do not have a sufficient amount of ratings to give an accurate recommendation.

3. Scalability - volume increase cause delay

4. Synonyms

The term, 'Synonyms' refers to the frequency of items that are similar, but are labeled differently.

And thus treated differently by the recommendation system. An Example of this would be 'Backpack' vs 'Knapsack'.

5. Gray Sheep

The term 'Gray Sheep' refers to the users that have opinions that don't necessarily 'fit' or are alike to any specific grouping. These users do not consistently agree or disagree on products or items, therefore making recommendations a non-beneficiary to them.

6. Shilling Attacks

However, Shilling Attacks are the abuse of this system by rating certain products high and other products low regardless of personal opinion. Therefore allowing that product to be recommended more often.

7. Long Tail effect - popular items are rated/viewed frequently. This creates a cycle where new items are just a shadow behind the popular items resulting.

It is common in many real-world use cases to only have access to implicit feedback (e.g. views, clicks, purchases, likes, shares etc.). The approach used in spark.mllib to deal with such data is taken from Collaborative Filtering for Implicit Feedback Datasets. Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data as numbers representing the strength in observations of user actions (such as the number of clicks, or the cumulative duration someone spent viewing a movie). Those numbers are then related to the level of confidence in observed user preferences, rather than explicit ratings given to items. The model then tries to find latent factors that can be used to predict the expected preference of a user for an item.

RANk - Its purely a characteristic of the data. As you said the rank refers the presumed latent or hidden factors. For example, if you were measuring how much different people liked movies and tried to cross-predict them then you might have three fields: person, movie, number of stars. Now, lets say that you were omniscient and you knew the absolute truth and you knew that in fact all the movie ratings could be perfectly predicted by just 3 hidden factors, sex, age and income. In that case the "rank" of your run should be 3.

Of course, you don't know how many underlying factors, if any, drive your data so you have to guess. The more you use, the better the results up to a point, but the more memory and computation time you will need.

One way to work it is to start with a rank of 5-10, then increase it, say 5 at a time until your results stop improving. That way you determine the best rank for your dataset by experimentation.

spark.mllib uses the alternating least squares (ALS) algorithm to learn these latent factors. The implementation in spark.mllib has the following parameters:

numBlocks is the number of blocks used to parallelize computation (set to -1 to auto-configure).
rank is the number of latent factors in the model.
iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less.
lambda specifies the regularization parameter in ALS.
implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data.
alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations.

MatrixFactorizationModel(rank, userFeatures, productFeatures)

{

"name": "als",

"params": {

"rank": 10,

"numIterations": 20,

"lambda": 0.01,

"seed": 3

}

val implicitPrefs = false
val als = new ALS()
als.setUserBlocks(-1)
als.setProductBlocks(-1)
als.setRank(ap.rank)
als.setIterations(ap.numIterations)
als.setLambda(ap.lambda)
als.setImplicitPrefs(implicitPrefs)
als.setAlpha(1.0)
als.setSeed(seed)
als.setCheckpointInterval(10)
val m = als.run(mllibRatings)

advantage
1. Hierarchical matrix co-clustering / factorization(yes)
2. Preference versus intention
Distinguish between liking and interested in seeing /purchasing
Worthless to recommend an item a user already bought
3. Scalability
4. Relevant objectives
Predicting actual rating may be useless! Missing at random assumption

drawback of our model
1. Multiple individuals using the same account — individual preference
2. Cold start (new users)

--------------deep Learning-------------

Deep learning is a subset of machine learning and functions in a similar way but its capabilities are different. Deep learning algorithms are capable of determining on their own if the prediction are accurate or not. This is where deep learning gets tricky:-)

A deep learning model is designed to continually analyze data with a logic structure similar to how a human would draw conclusions.

To achieve this, deep learning uses a layered structure of algorithms called an artificial neural network (ANN). The design of an ANN is inspired by the biological neural network of the human brain. This makes for machine intelligence that’s far more capable than that of standard machine learning models.

Why Neural nets - due to complex patterns.

Why Neural Nets now - its very hard to train (using backpropogation refer vanishing gradient problem)and requires lot of CPU power. Up until 2006, deep nets were still underperforming shallow nets and other machine learning algorithms. But everything changed after three breakthrough papers published by Hinton, Lecun, and Bengio in 2006 and 2007.

Training and vanishing gradient

When you’re training a neural net, you’re constantly calculating a cost value. The cost is typically the difference between the net’s predicted output and the actual output from a set of labelled training data. The cost is then lowered by making slight adjustments to the weights and biases over and over throughout the training process, until the lowest possible value is obtained. Here is that forward prop again;

The training process utilizes something called a gradient, which measures the rate at which the cost will change with respect to a change in a weight or a bias.

When the gradient is large, the net will train quickly. When the gradient is small, the net will train slowly.

The process used for training a neural net is called back-propagation or back-prop. We saw before that forward prop starts with the inputs and works forward; back-prop does the reverse, calculating the gradient from right to left.

a gradient at any point is the product of the previous gradients up to that point. And the product of two numbers between 0 and 1 gives you a smaller number

What to use When?

If you’re interested in unsupervised learning – that is, you want to extract patterns from a set of unlabelled data – then your best bet is to use either a Restricted Boltzmann Machine, or an autoencoder.

If you have labeled data for supervised learning and you want to build a classifier,

For text processing tasks like sentiment analysis, parsing, and named entity recognition – use a Recurrent Net or a Recursive Neural Tensor Network, which we’ll refer to as an RNTN.

For any language model that operates on the character level, use a Recurrent Net.

For image recognition, use a Deep Belief Network or a Convolutional Net.

For object recognition, use a Convolutional Net or an RNTN.

For speech recognition, use a Recurrent Net.

In general, Deep Belief Networks and Multilayer Perceptrons with rectified linear units – also known as RELU – are both good choices for classification. For time series analysis, it’s best to use a Recurrent Net.

RBM - and how they overcame the vanishing gradient problem.

An RBM is a shallow, two-layer net; the first layer is known as the visible layer and the second is called the hidden layer. Each node in the visible layer is connected to every node in the hidden layer. An RBM is considered “restricted” because no two nodes in the same layer share a connection.

An RBM is the mathematical equivalent of a two-way translator – in the forward pass, an RBM takes the inputs and translates them into a set of numbers that encode the inputs. In the backward pass, it takes this set of numbers and translates them back to form the re-constructed inputs. A well-trained net will be able to perform the backwards translation with a high degree of accuracy. In both steps, the weights and biases have a very important role. They allow the RBM to decipher the interrelationships among the input features, and they also help the RBM decide which input features are the most important when detecting patterns.

Through several forward and backward passes, an RBM is trained to reconstruct the input data. Three steps are repeated over and over through the training process:

a) With a forward pass, every input is combined with an individual weight and one overall bias, and the result is passed to the hidden layer which may or may not activate.

b) Next, in a backward pass, each activation is combined with an individual weight and an overall bias, and the result is passed to the visible layer for reconstruction.

c) At the visible layer, the reconstruction is compared against the original input to determine the quality of the result.

RBMs use a measure called KL Divergence for step c);

steps a) thru c) are repeated with varying weights and biases until the input and the re-construction are as close as possible.

DBN -

A deep belief network can be viewed as a stack of RBMs, where the hidden layer of one RBM is the visible layer of the one "above" it.

Training DBN -

a) The first RBM is trained to re-construct its input as accurately as possible

b) The hidden layer of the first RBM is treated as the visible layer for the second and the second RBM is trained using the outputs from the first RBM

c) This process is repeated until every layer in the network is trained

An important note about a DBN is that each RBM layer learns the entire input. In other kinds of models – like convolutional nets – early layers detect simple patterns and later layers recombine them

CNN - The process of filtering through the image for a specific pattern.

used Supervised learning methods.

a CNN layer has the flashlight structure. Each neuron is only connected to the input neurons it "shines" upon.

The neurons in a given filter share the same weight and bias parameters. This means that, anywhere on the filter, a given neuron is connected to the same number of input neurons and has the same weights and biases.

This is what allows the filter to look for the same pattern in different sections of the image.

The next two layers that follow are RELU and pooling, both of which help to build up the simple patterns discovered by the convolutional layer. Each node in the convolutional layer is connected to a node that fires like in other nets. The activation used is called RELU, or rectified linear unit. CNNs are trained using backpropagation, so the vanishing gradient is once again a potential issue.

The gradient is held more or less constant at every layer of the net. So the RELU activation allows the net to be properly trained, without harmful slowdowns in the crucial early layers.

The pooling layer is used for dimensionality reduction.

Together, these three layers can discover a host of complex patterns, but the net will have no understanding of what these patterns mean.

So a fully connected layer is attached to the end of the net in order to equip the net with the ability to classify data samples.

A typical deep CNN has three sets of layers – a convolutional layer, RELU, and pooling layers – all of which are repeated several times. These layers are followed by a few fully connected layers in order to support classification.

Since CNNs are such deep nets, they most likely need to be trained using server resources with GPUs. Despite the power of CNNs, these nets have one drawback. Since they are a supervised learning method, they require a large set of labelled data for training, which can be challenging to obtain in a real-world application.

RNN - pattern in data change over time - use RNN

This deep learning model has a simple structure with a built-in feedback loop, allowing it to act as a forecasting engine

All the nets we’ve seen up to this point have been feedforward neural networks. In a feedforward neural network, signals flow in only one direction from input to output, one layer at a time. In a recurrent net, the output of a layer is added to the next input and fed back into the same layer, which is typically the only layer in the entire network.

Unlike feedforward nets, a recurrent net can receive a sequence of values as input, and it can also produce a sequence of values as output.

RNNs can be stacked to form cabaple network for complex output.

RNN is an extremely difficult net to train. Since these nets use backpropagation, we once again run into the problem of the vanishing gradient.

Unfortunately, the vanishing gradient is exponentially worse for an RNN. The reason for this is that each time step is the equivalent of an entire layer in a feedforward network. So training an RNN for 100 time steps is like training a 100-layer feedforward net – this leads to exponentially small gradients and a decay of information through time.

There are several ways to address this problem - the most popular of which is gating. Gating is a technique that helps the net decide when to forget the current input, and when to remember it for future time steps. The most popular gating types today are GRU and LSTM. Besides gating, there are also a few other techniques like gradient clipping, steeper gates, and better optimizers.

AutoEncoder - understand features in data act as feature extraction

an autoencoder is a neural net that takes a set of typically unlabelled inputs, and after encoding them, tries to reconstruct them as accurately as possible. As a result of this, the net must decide which of the data features are the most important, essentially acting as a feature extraction engine.

Autoencoders are typically very shallow, and are usually comprised of an input layer, an output layer and a hidden layer. An RBM is an example of an autoencoder with only two layers. Here is a forward pass that ends with a reconstruction of the input. There are two steps - the encoding and the decoding. Typically, the same weights that are used to encode a feature in the hidden layer are used to reconstruct an image in the output layer.

Autoencoders are trained with backpropagation, using a metric called “loss”.

loss measures the amount of information that was lost when the net tried to reconstruct the input. A net with a small loss value will produce reconstructions that look very similar to the originals.

Autoencoders can be deep. Deep autoencoders perform better at dimensionality reduction than
their predecessor, principal component analysis, or PCA

RNTN - Recursive Neural Tensor Network, designed for sentimental analysis and NLP.

The purpose of these nets was to analyze data that had a hierarchical structure.

Structure for RNTN - An RNTN has three basic components (root and two child - binary tree)

– a parent group, which we’ll call the root, the root group uses a classifier to fire out a class and a score.

and the child groups, which we’ll call the leaves receives the input and pass it to root group.

Each group is simply a collection of neurons, where the number of neurons depends on the complexity of the input data. the root is connected to both leaves, but the leaves are not connected to each other.

Technically speaking, the three components form what’s called a binary tree. In general, the leaf groups receive input, and the root group uses a classifier to fire out a class and a score.

The score represents the quality of the current parse, and the class represents an encoding of a structure in the current parse.

This goes into recursion until all inputs are used up and the net has a parse tree with all the input words.

Uses case -

Image classification, Object recognition, video recognition-driverless car, speech recognition.

In digital advertising, deep nets are used to segment users by purchase history in order to offer relevant and personalized ads in real time. Based on historical ad price data and other factors,deep nets can learn to optimally bid for ad space on a given web page.

Platform - no coding required but you are bounded by the offering, help in quick deployment but there is more cost associated with it.

example - H2o.ai, graphlab

library - no boundation of offering but requires coding. less cost

a library is a premade set of functions and modules that you can call through your own programs. you’ll need to code every aspect of a net, like the model, the layers, the activation, the training method, and any special methods for preventing overfitting

a commercial-grade library like deeplearning4j, Torch, or Caffe, Scientific projects - Theano, deepmat.

Theano - python library - I am not sure if Hadoop support is present at the time of writing this.

Caffe - c++ , interface with python and Matlab, good for machine vision for forecasting applications

TensorFlow - Python. based on computational graph (same as Theano). hadoop support, model parallelism, support openCL(GPU), TensorBoard

Glossary

MLP - Multi Layer perceptron

RBM - restricted Boltzmann machine

CNN - Convolutional Neural Net

RNN - Recurrent Net

DBN - Deep Delief Net

Encoders

RNTN - Recursive Neural Tensor Network

My Notes