Posts Tagged ‘Computer vision’

Facial Keypoints detection–Using R, deep learning and H2O

March 22, 2015

Computer vision is a hotly pursued subject. While web cameras on computers are good at taking pictures they have no intelligence–none at all–at recognizing if two images are same. Computer vision pushes machine learning to its limits. Deep learning neural networks composed of many hidden layers are used on powerful machines to detect, for example, if two images of faces are of the same person. Facebook with its DeepFace  technology has achieved near human level capability in recognizing faces.

Kaggle is hosting a competition in recognizing faces. The methodology for recognizing faces is this: discover the location of key points on faces such as x-coordinate of left-eye centre, y-coordinate of left-eye centre and so on. There are 30 key points. When in two images of faces locations of these key points match, the two images are presumed to be of the same face.

In this exercise, I have used deep learning algorithm of H2O. H2O is very easy to install either on stand-alone basis or within R. Its deep learning algorithm can be invoked from within R. Limitation of H2O deep learning is that there can be only one response variable. But here we have 30. So, we make modelling and predictions, one at a time, instead of all 30 at one go. You can loop modelling and prediction sequence over a number of location points depending upon the capacity of your machine. My machine has 16GB of RAM. I made predictions for 5 location-points in one for-loop sequence; and then for another 5 and so on till for all 30. I was able to achieve Kaggle score of 3.55854. Even if you have 8GB machine, do not be disappointed; you can get a respectable score.

Installing H2O in R on CentOS

Installation in R is easy. First use yum to install R-devel (yum install R-devel) and then libcurl-devel.x86_64 (yum install libcurl-devel.x86_64). Next, install dependencies mentioned here. My version of R is 3.1.1. Many dependencies such as ‘methods’, ‘tools’, ‘utils’ and ‘stats’ already come bundled with R and are pre-installed. Then download H2O, unzip it and install its R package from within R as mentioned on this page. Installation process should not take more than 20 minutes at the maximum.

Data structure and transformation

You must have downloaded training and test files from Kaggle. Training file has 7049 images of faces. Test file has 1783 images. First two lines of training.csv are as below:

bash-4.1$ head -2 training.csv 
left_eye_center_x,left_eye_center_y,right_eye_center_x,right_eye_center_y,left_eye_inner_corner_x,left_eye_inner_corner_y,left_eye_outer_corner_x,left_eye_outer_corner_y,right_eye_inner_corner_x,right_eye_inner_corner_y,right_eye_outer_corner_x,right_eye_outer_corner_y,left_eyebrow_inner_end_x,left_eyebrow_inner_end_y,left_eyebrow_outer_end_x,left_eyebrow_outer_end_y,right_eyebrow_inner_end_x,right_eyebrow_inner_end_y,right_eyebrow_outer_end_x,right_eyebrow_outer_end_y,nose_tip_x,nose_tip_y,mouth_left_corner_x,mouth_left_corner_y,mouth_right_corner_x,mouth_right_corner_y,mouth_center_top_lip_x,mouth_center_top_lip_y,mouth_center_bottom_lip_x,mouth_center_bottom_lip_y,Image
66.0335639098,39.0022736842,30.2270075188,36.4216781955,59.582075188,39.6474225564,73.1303458647,39.9699969925,36.3565714286,37.3894015038,23.4528721805,37.3894015038,56.9532631579,29.0336481203,80.2271278195,32.2281383459,40.2276090226,29.0023218045,16.3563789474,29.6474706767,44.4205714286,57.0668030075,61.1953082707,79.9701654135,28.6144962406,77.3889924812,43.3126015038,72.9354586466,43.1307067669,84.4857744361,238 236 237 238 240 240 239 241 241 243 240 239 231 212 190 173 148 122 104 92 79 73 74 73 73 74 81 74 60 64 75 86 93 102 100 105 109 114 121 127 132 134 137 137 140 139 138 137 137 140 141 143 144 147 148 149 147 147 148 145 147 144 146 147 147 143 134 130 130 128 116 104 98 90 82 78 85 88 86 80 77 87 108 111 115 128 133 188 242 252 250 248 251 250 250 250 235 238 236 238 238 237 238 242 241 239 237 233 215 195 187 156 119 103 93 78 68 73 75 75 72 75 70 61 66 77 91 96 106 108 113 120 125 131 134 138 135 138 139 145 144 144 142 140 141 141 148 147 150 149 152 151 149 150 147 148 144 148 144 146 146 143 139 128 132 135 128 112 104 97 87 78 79 83 85 83 75 75 89 109 111 117 117 130 194 243 251 249 250 249 250 251 237 236 237 238 237 238 241 238 238 238 241 221 195 187 163 124 106 95 81 68 70 73 73 72 73 69 65 74 82 94 103 110 111 119 127 135 140 139 

Each line has 30 facial key-point locations followed by 96 X 96 = 9216 pixel-intensity values (we have removed last many pixels in the second row above for brevity). The first line is header line with names of fields–there are 31 names. 30 location-point names and one name for ‘Image’ field. Thus, all pixel intensity values are in one field separated by space. The data structure when read in R is as follows. All location points are numeric. And ‘Image’ with all pixel values is one huge character field.


setwd("/home/ganesh/facial_key_points/")
# Read tarining file. Note that all fields are comma separated
#   but each Image has 96*96 =9216 values, space separated
#    to avoid Image being treaded as factor, set stringsAsFactors=FALSE
#      default is TRUE
df.train<-read.csv("training.csv", stringsAsFactors=FALSE, header=TRUE)
> str(df.train)
'data.frame':	7049 obs. of  31 variables:
 $ left_eye_center_x        : num  66 64.3 65.1 65.2 66.7 ...
 $ left_eye_center_y        : num  39 35 34.9 37.3 39.6 ...
 $ right_eye_center_x       : num  30.2 29.9 30.9 32 32.2 ...
 $ right_eye_center_y       : num  36.4 33.4 34.9 37.3 38 ...
 $ left_eye_inner_corner_x  : num  59.6 58.9 59.4 60 58.6 ...
 $ left_eye_inner_corner_y  : num  39.6 35.3 36.3 39.1 39.6 ...
 $ left_eye_outer_corner_x  : num  73.1 70.7 71 72.3 72.5 ...
 $ left_eye_outer_corner_y  : num  40 36.2 36.3 38.4 39.9 ...
 $ right_eye_inner_corner_x : num  36.4 36 37.7 37.6 37 ...
 $ right_eye_inner_corner_y : num  37.4 34.4 36.3 38.8 39.1 ...
 $ right_eye_outer_corner_x : num  23.5 24.5 25 25.3 22.5 ...
 $ right_eye_outer_corner_y : num  37.4 33.1 36.6 38 38.3 ...
 $ left_eyebrow_inner_end_x : num  57 54 55.7 56.4 57.2 ...
 $ left_eyebrow_inner_end_y : num  29 28.3 27.6 30.9 30.7 ...
 $ left_eyebrow_outer_end_x : num  80.2 78.6 78.9 77.9 77.8 ...
 $ left_eyebrow_outer_end_y : num  32.2 30.4 32.7 31.7 31.7 ...
 $ right_eyebrow_inner_end_x: num  40.2 42.7 42.2 41.7 38 ...
 $ right_eyebrow_inner_end_y: num  29 26.1 28.1 31 30.9 ...
 $ right_eyebrow_outer_end_x: num  16.4 16.9 16.8 20.5 15.9 ...
 $ right_eyebrow_outer_end_y: num  29.6 27.1 32.1 29.9 30.7 ...
 $ nose_tip_x               : num  44.4 48.2 47.6 51.9 43.3 ...
 $ nose_tip_y               : num  57.1 55.7 53.5 54.2 64.9 ...
 $ mouth_left_corner_x      : num  61.2 56.4 60.8 65.6 60.7 ...
 $ mouth_left_corner_y      : num  80 76.4 73 72.7 77.5 ...
 $ mouth_right_corner_x     : num  28.6 35.1 33.7 37.2 31.2 ...
 $ mouth_right_corner_y     : num  77.4 76 72.7 74.2 77 ...
 $ mouth_center_top_lip_x   : num  43.3 46.7 47.3 50.3 45 ...
 $ mouth_center_top_lip_y   : num  72.9 70.3 70.2 70.1 73.7 ...
 $ mouth_center_bottom_lip_x: num  43.1 45.5 47.3 51.6 44.2 ...
 $ mouth_center_bottom_lip_y: num  84.5 85.5 78.7 78.3 86.9 ...
 $ Image                    : chr  "238 236 237 238 240 240 239 241 241 243 240 239 231 212 190 173 148 122 104 92 79 73 74 73 73 74 81 74 60 64 75 86 93 102 100 1"| __truncated__ "219 215 204 196 204 211 212 200 180 168 178 196 194 196 203 209 199 192 197 201 207 215 199 190 182 180 183 190 190 176 175 175"| __truncated__ "144 142 159 180 188 188 184 180 167 132 84 59 54 57 62 61 55 54 56 50 60 78 85 86 88 89 90 90 88 89 91 94 95 98 99 101 104 107 "| __truncated__ "193 192 193 194 194 194 193 192 168 111 50 12 1 1 1 1 1 1 1 1 1 1 6 16 19 17 13 13 16 22 25 31 34 27 15 19 16 19 17 13 9 6 3 1 "| __truncated__ ...

It so happens that in some images some location-points are missing. A summary of data brings this out. It shows the number of missing values for every location column.

> summary(df.train)
 left_eye_center_x left_eye_center_y right_eye_center_x right_eye_center_y
 Min.   :22.76     Min.   : 1.617    Min.   : 0.6866    Min.   : 4.091    
 1st Qu.:65.08     1st Qu.:35.900    1st Qu.:28.7833    1st Qu.:36.328    
 Median :66.50     Median :37.528    Median :30.2514    Median :37.813    
 Mean   :66.36     Mean   :37.651    Mean   :30.3061    Mean   :37.977    
 3rd Qu.:68.02     3rd Qu.:39.258    3rd Qu.:31.7683    3rd Qu.:39.567    
 Max.   :94.69     Max.   :80.503    Max.   :85.0394    Max.   :81.271    
 NA's   :10        NA's   :10        NA's   :13         NA's   :13  
      
 left_eye_inner_corner_x left_eye_inner_corner_y left_eye_outer_corner_x
 Min.   :19.07           Min.   :27.19           Min.   :27.57          
 1st Qu.:58.04           1st Qu.:36.63           1st Qu.:71.72          
 Median :59.30           Median :37.88           Median :73.25          
 Mean   :59.16           Mean   :37.95           Mean   :73.33          
 3rd Qu.:60.52           3rd Qu.:39.26           3rd Qu.:75.02          
 Max.   :84.44           Max.   :66.56           Max.   :95.26          
 NA's   :4778            NA's   :4778            NA's   :4782 
          
 left_eye_outer_corner_y right_eye_inner_corner_x right_eye_inner_corner_y
 Min.   :26.25           Min.   : 5.751           Min.   :26.25           
 1st Qu.:36.09           1st Qu.:35.506           1st Qu.:36.77           
 Median :37.64           Median :36.652           Median :37.94           
 Mean   :37.71           Mean   :36.653           Mean   :37.99           
 3rd Qu.:39.37           3rd Qu.:37.754           3rd Qu.:39.19           
 Max.   :64.62           Max.   :70.715           Max.   :69.81           
 NA's   :4782            NA's   :4781             NA's   :4781       
     
 right_eye_outer_corner_x right_eye_outer_corner_y left_eyebrow_inner_end_x
 Min.   : 3.98            Min.   :25.12            Min.   :17.89           
 1st Qu.:20.59            1st Qu.:36.53            1st Qu.:54.52           
 Median :22.54            Median :37.87            Median :56.24           
 Mean   :22.39            Mean   :38.03            Mean   :56.07           
 3rd Qu.:24.24            3rd Qu.:39.41            3rd Qu.:57.95           
 Max.   :61.43            Max.   :70.75            Max.   :79.79           
 NA's   :4781             NA's   :4781             NA's   :4779         
   
 left_eyebrow_inner_end_y left_eyebrow_outer_end_x left_eyebrow_outer_end_y
 Min.   :15.86            Min.   :32.21            Min.   :10.52           
 1st Qu.:27.62            1st Qu.:77.67            1st Qu.:27.67           
 Median :29.53            Median :79.78            Median :29.77           
 Mean   :29.33            Mean   :79.48            Mean   :29.73           
 3rd Qu.:31.16            3rd Qu.:81.59            3rd Qu.:31.84           
 Max.   :60.88            Max.   :94.27            Max.   :60.50           
 NA's   :4779             NA's   :4824             NA's   :4824          
  
 right_eyebrow_inner_end_x right_eyebrow_inner_end_y right_eyebrow_outer_end_x
 Min.   : 6.921            Min.   :16.48             Min.   : 3.826           
 1st Qu.:37.552            1st Qu.:27.79             1st Qu.:13.562           
 Median :39.299            Median :29.57             Median :15.786           
 Mean   :39.322            Mean   :29.50             Mean   :15.871           
 3rd Qu.:40.917            3rd Qu.:31.25             3rd Qu.:17.999           
 Max.   :76.582            Max.   :62.08             Max.   :58.418           
 NA's   :4779              NA's   :4779              NA's   :4813     
        
 right_eyebrow_outer_end_y   nose_tip_x      nose_tip_y    mouth_left_corner_x
 Min.   :13.22             Min.   :12.94   Min.   :17.93   Min.   :22.92      
 1st Qu.:28.21             1st Qu.:46.60   1st Qu.:59.29   1st Qu.:61.26      
 Median :30.32             Median :48.42   Median :63.45   Median :63.18      
 Mean   :30.43             Mean   :48.37   Mean   :62.72   Mean   :63.29      
 3rd Qu.:32.66             3rd Qu.:50.33   3rd Qu.:66.49   3rd Qu.:65.38      
 Max.   :66.75             Max.   :89.44   Max.   :95.94   Max.   :84.77      
 NA's   :4813                                              NA's   :4780     
  
 mouth_left_corner_y mouth_right_corner_x mouth_right_corner_y
 Min.   :57.02       Min.   : 2.246       Min.   :56.69       
 1st Qu.:72.88       1st Qu.:30.798       1st Qu.:73.26       
 Median :75.78       Median :32.982       Median :76.00       
 Mean   :75.97       Mean   :32.900       Mean   :76.18       
 3rd Qu.:78.88       3rd Qu.:35.101       3rd Qu.:78.96       
 Max.   :94.67       Max.   :74.018       Max.   :95.51       
 NA's   :4780        NA's   :4779         NA's   :4779        

 mouth_center_top_lip_x mouth_center_top_lip_y mouth_center_bottom_lip_x
 Min.   :12.61          Min.   :56.72          Min.   :12.54            
 1st Qu.:46.49          1st Qu.:69.40          1st Qu.:46.57            
 Median :47.91          Median :72.61          Median :48.59            
 Mean   :47.98          Mean   :72.92          Mean   :48.57            
 3rd Qu.:49.30          3rd Qu.:76.22          3rd Qu.:50.68            
 Max.   :83.99          Max.   :94.55          Max.   :89.44            
 NA's   :4774           NA's   :4774           NA's   :33            
   
 mouth_center_bottom_lip_y    Image          
 Min.   :25.85             Length:7049       
 1st Qu.:75.55             Class :character  
 Median :78.70             Mode  :character  
 Mean   :78.97                               
 3rd Qu.:82.23                               
 Max.   :95.81                               
 NA's   :33                                  

For the following six location-points, NA’s are less than 33:

1.left_eye_center_x  2.left_eye_center_y  3.right_eye_center_x  4.right_eye_center_y
5.mouth_center_bottom_lip_x   6.mouth_center_bottom_lip_y 

For the following two location points there are no missing values:

1.nose_tip_x      2.nose_tip_y 

For all others i.e 22 locations, numbers of missing points are between 4774-4824. List is below:

1.left_eye_inner_corner_x    2.left_eye_inner_corner_y    3.left_eye_outer_corner_x
4.left_eye_outer_corner_y    5.right_eye_inner_corner_x   6.right_eye_inner_corner_y
7.right_eye_outer_corner_x   8.right_eye_outer_corner_y   9.left_eyebrow_inner_end_x
10.left_eyebrow_inner_end_y  11.left_eyebrow_outer_end_x  12.left_eyebrow_outer_end_y
13.right_eyebrow_inner_end_x 14.right_eyebrow_inner_end_y 15.right_eyebrow_outer_end_x
16.right_eyebrow_outer_end_y 17.mouth_left_corner_x       18.mouth_left_corner_y 
19.mouth_right_corner_x      20.mouth_right_corner_y      21.mouth_center_top_lip_x 
22.mouth_center_top_lip_y

It is instructive to calculate standard deviation of this data for it may show how close (or tough) our prediction task can be. A large variation implies tougher prediction job.

> library(plyr)
  # Omit NA values and calculate sddev, column wise for 30 columns
> colwise(sd)(na.omit(df.train[,1:30]))
  left_eye_center_x  left_eye_center_y  right_eye_center_x  right_eye_center_y
1          2.087683          2.294027           2.051575           2.234334

  left_eye_inner_corner_x   left_eye_inner_corner_y      left_eye_outer_corner_x
1                2.005631                  2.0345                  2.701639

  left_eye_outer_corner_y   right_eye_inner_corner_x     right_eye_inner_corner_y
1                2.684162                 1.822784                 2.009505

  right_eye_outer_corner_x  right_eye_outer_corner_y     left_eyebrow_inner_end_x
1                 2.768804                 2.654903                2.819914

  left_eyebrow_inner_end_y  left_eyebrow_outer_end_x     left_eyebrow_outer_end_y
1                 2.867131                 3.312647                3.627187

  right_eyebrow_inner_end_x  right_eyebrow_inner_end_y   right_eyebrow_outer_end_x
1                 2.609648                  2.842219               3.337901

  right_eyebrow_outer_end_y  nose_tip_x     nose_tip_y   mouth_left_corner_x
1                 3.644342   3.276053      4.528635                3.650131

  mouth_left_corner_y        mouth_right_corner_x        mouth_right_corner_y
1            4.438565                 3.595103                      4.259514

  mouth_center_top_lip_x     mouth_center_top_lip_y      mouth_center_bottom_lip_x
1               2.723274               5.108675                     3.032389

  mouth_center_bottom_lip_y
1               4.813557

Thus, while in some cases standard deviation is small in other cases it is quite large. In those cases where variation is not large enough (1.8, for example) even mean of column value may serve as good prediction value.

In modelling, larger the number of observations, the better is the model. And further, as at one time, we are taking only one location point (response variable) for prediction, it implies the size of training data will vary from location point to location point depending upon how many missing values are in that column. For example, for predicting nose_tip_, we can consider all the 7049 observations but for mouth_center_top_lip_ we will have just, 7049-4774=2275, observations.

I, however, took a short-cut and for training considered only those rows where complete data was available. Yet, I will not advise others to do so. The following R-code filters complete cases to a file on hard-disk. The code is well commented

# R code to generate complete cases from training.csv

# Library for parallel operations
library(doMC)
registerDoMC(cores=4)
# your working directory
setwd("/home/ashokharnal/facial_key_points/")

# Read tarining file. Note that all fields are comma separated
#   but each Image has 96*96 =9216 values, space separated
#    to avoid Image being treaded as factor, set stringsAsFactors=FALSE
#      (default is TRUE)
df.train<-read.csv("training.csv", stringsAsFactors=FALSE, header=TRUE)

# Omit rows with incomplete classification data
ok<-complete.cases(df.train)
filtered.df.train<-df.train[ok,]
# So how many rows are left?
dim(filtered.df.train)

# Get all image data (field) into another variable
im.train<-filtered.df.train$Image
# Remove image data from filtered data
filtered.df.train$Image<-NULL
# But, introduce an ID column for later merger
filtered.df.train$id<-1:dim(filtered.df.train)[1]

# Split image data on space.
#   Split row by row (foreach)
#    and combine (.combine) each rows data by rbind
#     Do it in parallel (%dopar%)
im.train <- foreach(im = im.train, .combine=rbind) %dopar% {
    as.integer(unlist(strsplit(im, " ")))
}

# Check resulting data structure and its class
str(im.train)
class(im.train)
# Convert it to a data frame
df.im.train<-data.frame(im.train)
# Remove row names
row.names(df.im.train)<-NULL
dim(df.im.train)

# Add an ID to this image data
df.im.train$id<-1:dim(df.im.train)[1]
# Just check what default column names this data has
colnames(df.im.train)

# Merge now complete cases filtered data (30 columns) 
#   with corresponding image data 
#     Merger is on ID column
#       Then remove ID column and save the data frame to hard disk
df <- data.frame(merge(filtered.df.train,df.im.train,by="id"))
df$id<-NULL
dim(df)
write.csv(df,"complete.csv",row.names=F,quote=F)

# Recheck names of first 30 col names
colnames(df[,1:30])
# Check names of next 30 col names
colnames(df[,31:61])

test.csv‘ file has just two columns. One is the ImageId column and the other ‘Image‘ column. Pixel values in Image field are separated by spaces. All Image data, as in training file, is space separated. We need to remove ImageId column from ‘test.csv‘ and introduce comma between pixel intensity values and save the resulting data frame to disk. The following R code does this. Code being same as above, is not commented. Note that in file ‘test.csv‘ columns for facial key-points are not there. That is the prediction job.

# R code to convert test data Image field to csv format
library(doMC)
registerDoMC(cores=4)
setwd("/home/ashokharnal/Documents/facial_key_points/")
df.test<-read.csv("test.csv", stringsAsFactors=FALSE, header=TRUE)
im.test<-df.test$Image
im.test <- foreach(im = im.test, .combine=rbind) %dopar% {
    as.integer(unlist(strsplit(im, " ")))
}
df<-data.frame(im.test)
row.names(df)<-NULL
write.csv(df,"t1.csv",row.names=F,quote=F)

Building Model and making predictions

Once we have the data files ready, all we have to do is to use R to model data using deep learning algorithm. The code for this is given below. Explanations for model building are given thereafter. I have commented the code for easy understanding.

library(h2o)

# Start h2o from within R
#  Also decide min and max memory sizes
#   as also the number of cpu cores to be used (-1 means all cores)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, max_mem_size = '12g', min_mem_size = '4g', nthreads = -1)

setwd("/home/ganesh/Documents/")

# Read train and test files
complete<-read.csv("complete.csv",header=T)
test<-read.csv("t.csv",header=T)

# Process begins now
Sys.time()
# Convert test data frame to h2o format
test.hex<-as.h2o(localH2O, test, key = 'test.hex')

# Initialse data frame with as many rows as are in 'test' to store results
#  Predicted response columns will be appended to this data frame
result<-data.frame(1:dim(test)[1])

# Make predictions for columns from 'start' to 'end' one by one
# (Total columns 30)
start<-1	# Start from attribute 1
end<-5          # End at attribute 5

for ( i in start:end )
	{
	# Out of 30 columns, remove from 'complete' dataFrame
        #   all columns but the response column
	#     ie column to make predictions for will stay
        #       along with columns of pixel intensity values
	col<-1:30
	col<-col[-i]
	# Filter columns from training set accordingly
	part<-complete[,-col]

	# Convert the training data frame to h2o format
	print("Convert part of csv data to h2o format")
	part.hex<-as.h2o(localH2O, part, key = 'part.hex')
	# Print the column number immediately (flush.console)
	print(i)
	flush.console()

	# Start modeling process
	c_name<-paste("Modeling for ",names(part)[1],sep="")
	# Print column name being modeled
	print(c_name)
	flush.console()

	# epoch is a learning cycle or one pass.
	# Training your network on each obs of the set once is an epoch. 

	model <- h2o.deeplearning(x = 2:9217, y = 1,  data = part.hex, nfolds = 10, l1=1e-5 ,  activation = "RectifierWithDropout", input_dropout_ratio = 0.2, hidden_dropout_ratios = c(0.5,0.5,0.5,0.5,0.5,0.5), hidden = c(200,200,100,100,50,50),  classification=FALSE, epochs = 40)

	print("Modeling completed")
	flush.console()

	## Predictions
	# In test data frame, make predictions for this column
	test_predict.hex <- h2o.predict(model, test.hex)
        # Transform it to dataframe format
	test_predict <- as.data.frame(test_predict.hex)
	# Change column name of test_predict to that of response column
	colnames(test_predict)=names(part)[1]

	# Append predicted response column to result dataframe
	result[i-start+1]<-test_predict

	# Write every result to file (sample file name is: first5.csv)
	result_file<-paste("first",end,".csv",sep="")
	write.csv(result, file = result_file , row.names=FALSE, quote=FALSE)

	# Remove garbage & release memory to OS. 
	gc()
	}

# Analysis Ending time
Sys.time()

# Before you exit R, shutdown h2o
h2o.shutdown(localH2O, prompt=FALSE)

A good example on deep learning from h2o documentation that explains parameters in detail is here. Parameters that a deep-learning model may use are given below:


h2o.deeplearning(x, y, data, key = "",override_with_best_model, classification = TRUE,
nfolds = 0, validation, holdout_fraction = 0, checkpoint = "", autoencoder,
use_all_factor_levels, activation, hidden, epochs, train_samples_per_iteration,
seed, adaptive_rate, rho, epsilon, rate, rate_annealing, rate_decay,
momentum_start, momentum_ramp, momentum_stable, nesterov_accelerated_gradient,
input_dropout_ratio, hidden_dropout_ratios, l1, l2, max_w2,
initial_weight_distribution, initial_weight_scale, loss,
score_interval, score_training_samples, score_validation_samples,
score_duty_cycle, classification_stop, regression_stop, quiet_mode,
max_confusion_matrix_size, max_hit_ratio_k, balance_classes, class_sampling_factors,
max_after_balance_size, score_validation_sampling, diagnostics,
variable_importances, fast_mode, ignore_const_cols, force_load_balance,
replicate_training_data, single_node_mode, shuffle_training_data,
sparse, col_major, max_categorical_features, reproducible)

We are using 6 hidden layers with number of neurons as 200,200,100,100,50,50. When response variable is continuous, classification is FALSE; it is regression. In n-fold cross-validation, data is partitioned into n-parts. One part is reserved for cross-validation and model is built using the other (n-1) parts. One by one, other folds are reserved and used for cross-validation in turn. n-results from this cross-validation are then averaged to produce an accuracy estimate. nfolds in our case is 10. H2O deep learning offers a number of choices for activation function. Among them, three are: sigmoid, tanh and rectifier. Rectifier is quite accurate and is faster. Hyperbolic tangent is computationally expensive. Dropouts are used to switch off certain neurons so as to avoid over fitting. input_dropout_ratio is for dropping off neurons from initial layer and hidden_dropout_ratios for dropping off neurons from hidden layers. When dropout ratios are specified, activation function is ‘RectifierWithDropout’. Parameter l1 is also to avoid over fitting. It allows only strong weights. You can download h2o package documentation from here.

After you have run the above R code, exit R, Restart R again. Change ‘start‘ and ‘end‘ values to 6 and 10 and run R-code once again. Why this? Why not go through 1 to 10 or 1 to 30 in one go? It is because I observed that memory after every for loop is not being released to operating system. After 6 loops are so, the process becomes very slow. I have to exit from R and then start R again and go through the for loop once again from where it terminated last time. I, therefore, run R-code six times (1-5, 6-10, 11-15, 16-20, 21-25 and 26-30). And this results in predicted values for all 30 columns being written to six files: first5.csv, first10.csv, first15.csv, first20.csv, first25.csv, first30.csv; each contains five columns of predicted data for the relevant columns.

Once we have all the predicted columns, the following R code compiles and prepares the data for submission to Kaggle.

library(reshape2)

setwd("/home/ganesh/Documents/")
# Read the truncated training file that we wrote above
#  Read it just to know column names of facial key-points
complete<-read.csv("complete.csv",header=T)
# Read modified test file. 
#  I want to re-verify number of rows/images
test<-read.csv("t.csv",header=T)

# Read prediction files one by one
first5<-read.csv("first5.csv",header=T)        # Cols 1 to 5
first10<-read.csv("first10.csv",header=T)      # Cols 6 to 10
first15<-read.csv("first15.csv",header=T)      # Cols 11 to 15
first20<-read.csv("first20.csv",header=T)      # Cols 16 to 20
first25<-read.csv("first25.csv",header=T)      # Cols 21 to 25
first30<-read.csv("first30.csv",header=T)      # Cols 26 to 30

# Just check if all of them have same number of rows
dim(first5)
dim(first10)
dim(first15)
dim(first20)
dim(first25)
dim(first30)

# Start merging all in 'first' data frame
first<-first5
first[,6:10]<-first10
first[,11:15]<-first15
first[,16:20]<-first20
first[,21:25]<-first25
first[,26:30]<-first30

# Assign col names to 'first' for 30 columns
colnames(first)<-names(complete[,1:30])

# Give a unique name to file that will save to disk all prediction results
#   (as you may be repeating this expt many times)
#     add date+time to its name
dt<-Sys.time()
datetime<-format(dt, format="%d%m%Y%H%M")
result_filename<-paste("first",datetime,".csv",sep="")
# This file is not to be submitted but contains column wise data
write.csv(first, file = result_filename , row.names=FALSE, quote=FALSE)

# Create a data frame with as many rows as in test.
#   ImageId column contains seq number 
predictions <- data.frame(ImageId = 1:nrow(test))
predictions[2:31]<-first         # Add other 30 columns to it
head(predictions)                # Check

# Restack predictions, ImageId wise
submission <- melt(predictions, id.vars="ImageId", variable.name="FeatureName", value.name="Location")
head(submission)
# Read IdLookupTable.csv file downoloaded from Kaggle 
Id.lookup <- read.csv("IdLookupTable.csv",header=T)
Idlookup_colnames <- names(Id.lookup)
Idlookup_colnames
Id.lookup$Location <- NULL

# Row wise merger. A row in 'Id.lookup' is merged with same row in 'submission'.
#   At least one column name should be same.
# When all.x=TRUE, an extra row will be added to the output for each case in Id.lookup
#   that has no matching cases in submission.
#  Cases that do not have values from submission will be labeled as missing.
#    See: https://kb.iu.edu/d/azux    http://www.cookbook-r.com/Manipulating_data/Merging_data_frames/
msub <- merge(Id.lookup, submission, all.x=T, sort=F)
# Adds columns (RowId) not in msub
nsub <- msub[, Idlookup_colnames]

# Give a unique name to submission file
#  Add date+time to its name
submit_file<-paste("submit",datetime,".csv",sep="")
# Write to disk file for submission to Kaggle
write.csv(nsub[,c(1,4)], file=submit_file, quote=F, row.names=F)

This finishes our experiment. As mentioned above, if you avoid the short-cut of using minimum observations (as in, ‘complete.csv‘) for training the model for all columns, you can achieve much better accuracy while at the same time working from an ordinary machine. A RAM of 8GB should give you very respectable score. On an 8GB machine reduce the number of hidden layers; and maybe use for-loop sequence of 1 to 3 columns in model building. My Kaggle score page-image is below. Good luck! (Edited: Avoiding Shortcut: Some more editions have been made since I wrote the above. Please read on below!)

kaggle score

**EDITED** Avoiding Shortcut

If you want to avoid the short-cut and want to build the model by taking into account all available data, column by column, then the following two R-codes will work for you. The first one creates a file ‘complete.csv’ that converts image data to csv format and concatenates it with its 30-column characteristics.

# R code to transform training.csv file
#  Spaces within pixel intensity values are replaced by commas and image data is transformed
#    into data frame

library(doMC)
registerDoMC(cores=4)

setwd("/home/ganesh/Documents/data_analysis/facial_key_points/")

# Read tarining file. To avoid Image being treaded as factor,
#   set stringsAsFactors=FALSE ( default is TRUE)
df.train<-read.csv("training.csv", stringsAsFactors=FALSE, header=TRUE)

# So how many rows are there?
dim(df.train)

# Get all image data into another variable
im.train<-df.train$Image
df.train$Image<-NULL

# Introduce an ID column in data
df.train$id<-1:dim(df.train)[1]

# Split image data on space and insert comma instead
im.train <- foreach(im = im.train, .combine=rbind) %dopar% {
    as.integer(unlist(strsplit(im, " ")))
}

# Convert it to a data frame
df.im.train<-data.frame(im.train)
# Remove row names
row.names(df.im.train)<-NULL

# Add an ID to this image data
df.im.train$id<-1:dim(df.im.train)[1]
# Just check what default column names data has
colnames(df.im.train)


# Merge now train data (30 columns) with corresponding image data 
#  Merger is on ID column
#   Then remove ID column and save the data frame to hard disk
df <- data.frame(merge(df.train,df.im.train,by="id"))
df$id<-NULL
dim(df)
write.csv(df,"complete.csv",row.names=F,quote=F)

# Recheck names of first 30 col names
colnames(df[,1:30])
# Check names of next 30 col names
colnames(df[,31:61])

Once you have the file ‘complete.csv’ on hard disk, the following R code uses h2o to train each of the 30 columns as per the actual number of training data for that particular column. Thus all data is used for training. But then if your machine does not have sufficient RAM, the model building may be time consuming and test your patience.


# R file to create deep learning model 
# Takes into account NAs per column of data rather
#  than for complete data set.
#    This model is improvement over the earlier model but consumes a lot of time
#      complete.csv is full data set including comma separated image pixel values

library(h2o)
localH2O <- h2o.init(ip = "localhost", port = 54321, startH2O = TRUE, max_mem_size = '12g', min_mem_size = '4g', nthreads = -1)

# Set working directory
setwd("/home/ganesh/Documents/")

complete<-read.csv("complete.csv",header=T)
test<-read.csv("t.csv",header=T)

# Analysis begin time
Sys.time()
# Convert test.csv to h2o format
test.hex<-as.h2o(localH2O, test, key = 'test.hex')
result<-data.frame(1:dim(test)[1])
# Build models for five columns at a time
start<-1           # We begin from column 1	
end<-5             # Last one is column 5

for ( i in start:end )
	{
	col<-1:30
	col<-col[-i]
	part<-complete[,-col]
	ok<-complete.cases(part)
	part<-part[ok,]
	print(paste("Records in data set are: ",nrow(part)))
	flush.console()	
	print("Convert part of csv data to h2o format")
	part.hex<-as.h2o(localH2O, part, key = 'part.hex')
	print(i)
	flush.console()
	c_name<-paste("Modeling for ",names(part)[1],sep="")
	# Print column name being modeled
	print(c_name)
	flush.console()

	model <- h2o.deeplearning(x = 2:9217, y = 1,  data = part.hex, nfolds = 10, l1=1e-5 ,  activation = "RectifierWithDropout", input_dropout_ratio = 0.2, hidden_dropout_ratios = c(0.5,0.5,0.5,0.5,0.5,0.5), hidden = c(200,200,100,100,50,50),  classification=FALSE, epochs = 30)

	print("Modeling completed")
	flush.console()

	## Predictions
	# Make predictions for this column
	test_predict.hex <- h2o.predict(model, test.hex)
	test_predict <- as.data.frame(test_predict.hex)
	# Change column name of test_predict to that of response column
	colnames(test_predict)=names(part)[1]

	# Append predicted response column to result dataframe
	result[i-start+1]<-test_predict

	# Write every result to file
	result_file<-paste("first",end,".csv",sep="")
	write.csv(result, file = result_file , row.names=FALSE, quote=FALSE)

	# Remove garbage & release memory to OS. 
	gc()
	}

# Analysis Ending time
Sys.time()

# Before you exit R, shutdown h2o
h2o.shutdown(localH2O, prompt=FALSE)