Perceptron Classifiers and Neural Networks
I've written a previous post about the possibility and implications of 'The $1,000 Genome' and how programming is going to become increasingly important in research. I think big data analytics and machine learning redefine our understanding of the problems we are able to solve are going to be pivotal in answering the most important and evasive biological and epidemiological questions of our time. With that in mind, I'd like to give a brief introduction to machine learning and neural network models. Working with these technologies was the major reason why I became interested in programming in the first place, and I'm currently in the process of researching a neural networking project I want to begin in the next few weeks. I'm looking forward to this adventure and suspect that this topic will become a theme for my posts over the next several months, so stay tuned...
Most fundamentally, machine learning is the application of statistical techniques to extract patterns and make predictions based on those patterns. The methods used to generate these models can be broadly split into supervised and unsupervised learning styles. In supervised learning, models make predictions and are corrected according the known outcomes of a labeled training dataset. This process continues until the model achieves a predetermined level of accuracy on the training inputs. Meanwhile, unsupervised learning uses unlabeled training data without expected outputs. In this scenario, models are prepared by identifying structures in input data to clarify general rules, reduce redundancy or organize data by similar characteristics.
Many kinds of algorithms are used in machine learning, including but not limited to regression, instance-based, regularization, decision tree, clustering, association-rule learning, neural network, deep learning, and ensemble algorithms. A proper explanation of each of these categories would be deserving of its' own blog post (or textbook, really), but for now I'm going to focus on neural networks.
Neural networks are extremely flexible model functions that are built up from the cascade of several layers, each of which is a collection of perceptron functions known as 'nodes.' A perceptron is a binary linear classifier that partitions space into parts using linear function features. The decision boundary for a perceptron classifier with n features as inputs is a linear n-1 dimension hyperplane (this is easiest to visualize for 2 features, when the decision boundary is just a line).
Perceptron functions associate a weight parameter (W) with each feature to calculate the weighted sum of inputs. This weighted sum is passed into a threshold function that binarizes the output. As an illustration (below): In the case of a single feature X, the threshold function classifies inputs by dividing a 1D feature space into 2 parts, negative and positive, with a single transition point. The decision boundary for X is the single value where T(W1X+W0) transitions betweens the two possible classifications - in the case of a logistic threshold function, between 0 and 1.
If we want to design a more complex classifier with a more complex decision boundary, we can augment the original feature set by using new transform features. For example, switching from a linear classifier to a polynomial one adds an additional X2 feature. The decision boundary for such a classifier is the value where T(W1X2 + W2X + W0) transitions between positive and negative, which could be either one or two points.
Multi-layer Perceptron Models
Instead of adding polynomial features, we can create more complex classifiers using multiple binary-valued features obtained by thresholding our original feature at different points. By combining these features into a linear function we can get the same shapes as those obtained using polynomial features. These step functions are perceptrons themselves with new features as their output, and by using them as input values to another perceptron we've created a stack of two linear classifiers whose output can be more complex than a single standard perceptron. In this system, the layers of perceptrons that generate the features used as inputs farther down the cascade are known as 'hidden layers', and the perceptrons themselves are 'hidden nodes.' Likewise, the layer that generates the final classification result is the 'output layer', and those perceptrons are 'output nodes'.
The ideal architecture of a neural network is dictated by characteristics of the input data and the kind of patterns the model is aiming to detect. A given neural network may contain many hidden layers with varying numbers of hidden nodes. Given enough hidden nodes, even a 2 layer network can represent a complex function arbitrarily closely. To achieve this, each hidden node acts as a step-function 'detector' and fires in a particular region of space that is selected by the difference between hidden node values. Neural networks can therefore mimic complicated operations by using specific weight parameters to discretize the function into small regions that can be approximated by a combination of step functions.
Most multi layer perceptrons function as feed-forward neural networks, meaning that information flows in one direction from the inputs through the hidden layers to the output layer. Recurrent neural networks are used in some deep learning models and allow layers to feed back into previous layers. While this feature may allow a more sophisticated interpretation of data, it also creates complex systems with many self-dependencies that is harder to analyze and train than their feed-forward counterparts.