Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano. In this post you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step. The dataset we will use in this tutorial is the Sonar dataset. This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles.
It is a binary classification problem that requires a model to differentiate rocks from metal cylinders. You can download the dataset for free and place it in your working directory with the filename sonar. It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1.
A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Next, we can initialize the random number generator to ensure that we always get the same results when executing this code. This will help if we are debugging. Now we can load the dataset using pandas and split the columns into 60 input variables X and 1 output variable Y.
This class will model the encoding required using the entire dataset via the fit function, then apply the encoding to create a new output variable using the transform function. We are going to use scikit-learn to evaluate the model using stratified k-fold cross validation.
This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts, training the model on all parts except one which is held out as a test set to evaluate the performance of the model. This process is repeated k-times and the average score across all constructed models is used as a robust estimate of performance.
It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data. To use Keras models with scikit-learn, we must use the KerasClassifier wrapper. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit such as the number of epochs and the batch size.
Our model will have a single fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks. The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.
The model also uses the efficient Adam optimization algorithm for gradient descent and accuracy metrics will be collected when the model is trained. Now it is time to evaluate this model using stratified cross validation in the scikit-learn framework.
We pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off given that the model will be created 10 times for the fold cross validation being performed.
Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data. Neural network models are especially suitable to having consistent input values, both in scale and distribution. An effective data preparation scheme for tabular data when building neural network models is standardization.
This is where the data is rescaled such that the mean value for each attribute is 0 and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions whilst normalizing the central tendencies for each attribute.
We can use scikit-learn to perform the standardization of our Sonar dataset using the StandardScaler class. We can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure.
Here, we can define a pipeline with the StandardScaler followed by our neural network model. Running this example provides the results below. We do see a small but very nice lift in the mean accuracy.
There are many things to tune on a neural network, such as the weight initialization, activation functions, optimization procedure and so on. One aspect that may have an outsized effect is the structure of the network itself called the network topology. In this section, we take a look at two experiments on the structure of the network: The data describes the same signal from different angles.
Perhaps some of those angles are more relevant than others. We can force a type of feature extraction by the network by restricting the representational space in the first hidden layer. In this experiment, we take our baseline model with 60 neurons in the hidden layer and reduce it by half to This will put pressure on the network during training to pick out the most important structure in the input data to model. We will also standardize the data as in the previous experiment with data preparation and try to take advantage of the small lift in performance.
Running this example provides the following result. We can see that we have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation average spread of the accuracy scores for the model. This is a great result because we are doing slightly better with a network half the size, which in turn takes half the time to train. A neural network topology with more layers offers more opportunity for the network to extract key features and recombine them in useful nonlinear ways.
We can evaluate whether adding more layers to the network improves the performance easily by making another small tweak to the function used to create our model. Here, we add one new layer one line to the network that introduces another hidden layer with 30 neurons after the first hidden layer.
The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like we did in the experiment above with the smaller network.
Running this example produces the results below. We can see that we do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.
What is the best score that you can achieve on this dataset? You learned how you can work through a binary classification problem step-by-step with Keras, specifically:.
Do you have any questions about Deep Learning with Keras or about this post? Ask your questions in the comments and I will do my best to answer. Discover how in my new Ebook: Deep Learning With Python. It covers self-study tutorials and end-to-end projects on topics like: There is an example of evaluating a neural network on a manual verification dataset while the model is being fit here: You can use the model. You can learn more about test options for evaluating machine learning algorithms here: However when I print back the predicted Ys they are scaled.
Is there a way to use standard scalar and then get your prediction back to binary? Hi Paul, I would advise you to scale your data before hand and keep the coefficients used to scale, then reuse them later to reverse the scaling of predictions.
I was wondering, how would one print the progress of the model training the way Keras usually does in this example particularly? Progress is turned off here because we are using k-fold cross validation which results in so many more models being created and in turn very noisy output. Hello Jason, Excellent tutorial. Consider a situation now. Suppose the data set loaded by you is the training set and the test set is given to you separately. I created the model as you described but now I want to predict the outcomes for test data and check the prediction score for the test data.
How can I do that? You can use model. This post provides an example of what you want: Thanks for this excellent tutorial , may I ask you regarding this network model; to which deep learning models does it belong? Note that the DBN and autoencoders are generally no longer mainstream for classification problems like this example. Thanks Jason for you reply, I have another question regarding this example. How can I know the reduced features after making the network smaller as in section 4.
The features are weighted, but the weighting is complex, because of the multiple layers. It would not be accurate to take just the input weights and use that to determine feature importance or which features are required. The hidden layer neurons are not the same as the input features, I hope that is clear.
Perhaps I misunderstand your question and you can elaborate what you mean? My case is as follows: I have something similar to your example. I have a deep Neural network with 11 features.
I used a hidden layer to reduce the 11 features to 7 and then fed it to a binary classifier to classify the values to A class or B class.
The first thing I need to know is that which 7 features of the 11 were chosen? In more details; when feature 1 have an average value of 0.More...