CNNs - Deep Learning - Puppy Post

I have been looking for a fun machine learning computer vision use case so I can play around with a CNN implementation. I haven’t worked on too many deep learning problems and the technology has evolved so quickly. My wife and I don’t regularly organize and backup my iPhone pictures. As I was going through the thousands of pictures, I realized a common theme:

The above is my beloved puppy and she has been the object of much of our photography (we’re a bit obsessed), unfortunately each one of these images are labeled in some non-descriptive format i.e. 000224444.jpg.

This could be a fun use case for a convolutional neural network (CNNs) as they are pretty good for computer vision, right?

I would want to build a model that could accurately find my dog and/or identify a person and then run it through all the photos on my hard drive identifying which ones contained my pup. I can then move those over to a relevant folder and then inspect the remainder, right?

First thing I’d need is access to a large number of images that I can train on. Luckily I found 2 of these datasets on Udacity:

I have chosen to use pyTorch as I’ve heard good things related to it’s Python first approach: pytorch

ls -ltr ./dogImages

total 0
drwxr-xr-x@ 135 Shalu  staff  4320 Jun  7 13:18 [34mtrain[m[m/
drwxr-xr-x@ 135 Shalu  staff  4320 Jun  7 13:18 [34mvalid[m[m/
drwxr-xr-x@ 136 Shalu  staff  4352 Jun  7 13:18 [34mtest[m[m/

Now I have directories above split out into ‘train’,’valid’, and ‘test’. Looking at the documentation around data loaders seems like ImageFolder is good for custom data sets.

import os
import torchvision.transforms as transforms
from torchvision import datasets
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

#training on GPUs makes things a lot faster due to parrallel processing (I personally did this on Amazon AWs)
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

CUDA is available!  Training on GPU ...

ls -ltr /data/

total 216
drwxr-xr-x    5 root root   4096 May 14  2018 [0m[01;34mdog_images[0m/
drwxr-xr-x 5751 root root 208896 May 14  2018 [01;34mlfw[0m/
drwxr-xr-x    2 root root   4096 May 14  2018 [01;34mbottleneck_features[0m/

Data Loading

Now I know how to load my data, but also I need to ensure that all pictures have the same sizes,etc. Therefore a few transforms are needed in addition to converting these numpy arrays into tensors. I just chose to use a size of 224 as that seemed to be commonly chosen.

num_workers = 0
# samples per batch to load
batch_size = 20
# percentage  to use as validation
valid_size = 0.2

data_dir = '/data/dog_images'

train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.5, 0.5, 0.5), 
                                                            (0.5, 0.5, 0.5))])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.5, 0.5, 0.5), 
                                                            (0.5, 0.5, 0.5))])


train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
valid_data = datasets.ImageFolder(data_dir + '/valid', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)


# prepare data loaders 
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    num_workers=num_workers, shuffle=True)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    num_workers=num_workers, shuffle=False)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers, shuffle=False)

loaders = {
    'train': train_loader,
    'valid': valid_loader,
    'test': test_loader
}

ls -l -t dogImages/train | head -10

total 0
drwxr-xr-x@ 66 Shalu  staff  2112 Mar 27  2017 001.Affenpinscher/
drwxr-xr-x@ 60 Shalu  staff  1920 Mar 27  2017 002.Afghan_hound/
drwxr-xr-x@ 54 Shalu  staff  1728 Mar 27  2017 003.Airedale_terrier/
drwxr-xr-x@ 65 Shalu  staff  2080 Mar 27  2017 004.Akita/
drwxr-xr-x@ 79 Shalu  staff  2528 Mar 27  2017 005.Alaskan_malamute/
drwxr-xr-x@ 66 Shalu  staff  2112 Mar 27  2017 006.American_eskimo_dog/
drwxr-xr-x@ 52 Shalu  staff  1664 Mar 27  2017 007.American_foxhound/
drwxr-xr-x@ 68 Shalu  staff  2176 Mar 27  2017 008.American_staffordshire_terrier/
drwxr-xr-x@ 36 Shalu  staff  1152 Mar 27  2017 009.American_water_spaniel/

After looking at my dog data, I’ll need to find out how many distinct breeds I have in order to train my model to identify the correct breed.

unique_dogs = (len(np.unique(train_data.classes)))
print (unique_dogs)

If I randomly choose a breed, I would have a probability of it being correct of 1/133, hopefully I can achieve an accuracy higher than that.

After googling around, I looked at a few CNN architectures and it seemed that 3-5 Convolutional layers would be a good start. I then initially had 1 fully connected layer, but to be honest the model performed poorly so I added two additional layers and adjusted my learning rate down.

Architecture

kernel = 3
pad = 1
fc_output = 5000

import torch.nn as nn
import torch.nn.functional as F
    

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)       
        self.fc1 = nn.Linear(256 * 14 * 14, fc_output)
        self.fc2 = nn.Linear(fc_output, 2000)
        self.fc3 = nn.Linear(2000, unique_dogs)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = self.pool(F.relu(self.conv4(x)))

        
        # flatten things out
        x = x.view(-1, 256 * 14 * 14)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
model = Net()

print(model)

# move tensors to GPU if available
if train_on_gpu:
    model.cuda()

Net(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=50176, out_features=5000, bias=True)
  (fc2): Linear(in_features=5000, out_features=2000, bias=True)
  (fc3): Linear(in_features=2000, out_features=133, bias=True)
)

I now need to specify a loss function and an optimizer for my neural network. I chose cross entropy loss according to the documentation

It is useful when training a classification problem with C classes. If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

Now I have to choose an optimizers in order to update the weights of my computed gradients. Stochastic Gradient Descent (SGD) and Adam are two popular methods, I settled on Adam after a little reading although I was cautioned that at times it has a tendency to overfit for these types of classification problems.

import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_scratch.parameters(), lr=0.0001)

train_on_gpu

True

Time to train the model on test and my validation set, saving the best performing one.

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    import time
    
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        start = time.time()
        train_loss = 0.0
        valid_loss = 0.0
        
        # train the model #
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            
            optimizer.zero_grad()
            
            output = model_scratch(data)
            
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
                
        # validate  #
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
        ## save the model if validation loss has decreased
        if valid_loss < valid_loss_min:
            torch.save(model.state_dict(), save_path)
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'
                  .format(valid_loss_min, valid_loss))
            valid_loss_min = valid_loss            
    # return trained model
        print(f"Total Time: {(time.time() - start)/60:.3f} minutes")
    
    return model

n_epochs = 10

model = train(n_epochs, loaders, model, optimizer, 
                      criterion, use_cuda, 'model_fc3.pt')

Epoch: 1  Training Loss: 4.809653   Validation Loss: 4.693537
Validation loss decreased (inf --> 4.693537).  Saving model ...
Total Time: 3.238 minutes
Epoch: 2  Training Loss: 4.553798   Validation Loss: 4.436768
Validation loss decreased (4.693537 --> 4.436768).  Saving model ...
Total Time: 3.248 minutes
Epoch: 3  Training Loss: 4.414593   Validation Loss: 4.335376
Validation loss decreased (4.436768 --> 4.335376).  Saving model ...
Total Time: 3.248 minutes
Epoch: 4  Training Loss: 4.302392   Validation Loss: 4.215953
Validation loss decreased (4.335376 --> 4.215953).  Saving model ...
Total Time: 3.269 minutes
Epoch: 5  Training Loss: 4.214605   Validation Loss: 4.099763
Validation loss decreased (4.215953 --> 4.099763).  Saving model ...
Total Time: 3.255 minutes
Epoch: 6  Training Loss: 4.107606   Validation Loss: 4.036846
Validation loss decreased (4.099763 --> 4.036846).  Saving model ...
Total Time: 3.256 minutes
Epoch: 7  Training Loss: 4.033701   Validation Loss: 3.941356
Validation loss decreased (4.036846 --> 3.941356).  Saving model ...
Total Time: 3.262 minutes
Epoch: 8  Training Loss: 3.953173   Validation Loss: 3.851250
Validation loss decreased (3.941356 --> 3.851250).  Saving model ...
Total Time: 3.258 minutes
Epoch: 9  Training Loss: 3.881432   Validation Loss: 3.778061
Validation loss decreased (3.851250 --> 3.778061).  Saving model ...
Total Time: 3.246 minutes
Epoch: 10   Training Loss: 3.785154   Validation Loss: 3.679010
Validation loss decreased (3.778061 --> 3.679010).  Saving model ...
Total Time: 3.256 minutes

# the most accurate model that got the best validation accuracy
model.load_state_dict(torch.load('modelfc3.pt'))

Test It

def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

test(loaders, model, criterion, use_cuda)

Test Loss: 3.713136

Test Accuracy: 13% (116/836)

Not so good

I rapidly gained a lot of empathy as after all that, my model accuracy was a poor 13%. I realized just how complex creating a CNN from scratch can be. In comes in Transfer Learning, which is currently pretty popular in Deep Learning as it enables you to train deep learning networks on similar problems with sparse amounts of data. Essentially I can apply a pre-trained model to a new problem.

Transfer Learning

Transfer Learning is the reuse of a pre-trained model on a new problem. It is currently very popular in the field of Deep Learning because it enables you to train Deep Neural Networks with comparatively little data.

Luckily, TorchVision has some models that have been [pretrained (https://pytorch.org/docs/master/torchvision/models.html) ….phewwwww

I’ll choose the VGG-16 model from Cornell University. It’s been trained on the ImageNet and here are all the categories of output dictionary. You can see that keys 151-268 are of dog breeds.

#let us use the same preprocessing and data loading as before, with notable difference
#I'm using a different transform to normalize the images as per the requirement of the VGG16 model

num_workers = 0
batch_size = 20
valid_size = 0.2
data_dir = '/data/dog_images'

train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.485, 0.456, 0.406), 
                                                            (0.229, 0.224, 0.225))])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize((0.485, 0.456, 0.406), 
                                                            (0.229, 0.224, 0.225))])

train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
valid_data = datasets.ImageFolder(data_dir + '/valid', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    num_workers=num_workers, shuffle=True)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    num_workers=num_workers, shuffle=False)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers, shuffle=False)

loaders = {
    'train': train_loader,
    'valid': valid_loader,
    'test': test_loader
}

Here is my model architecture

import torchvision.models as models
import torch.nn as nn

## TODO: Specify model architecture 
model = models.vgg16(pretrained=True)
    
print (model)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

# pretrained weights on feature parameters
for param in model_transfer.features.parameters():
    param.requires_grad = False

n_inputs = model_transfer.classifier[6].in_features

unique_dogs = (len(np.unique(train_data.classes)))
print (unique_dogs)

Adjust the last fully-connected layer

The previous output had 1000 classes of which 151-268 are of dog breeds. But I have only 133 classes of dog breeds and would like the output to

last_layer = nn.Linear(n_inputs, unique_dogs)

print(last_layer)

Linear(in_features=4096, out_features=133, bias=True)

model_transfer.classifier[6] = last_layer

model_transfer

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=133, bias=True)
  )
)

if use_cuda:
    model_transfer = model_transfer.cuda()

import torch.optim as optim

# again same reasoning as before as the classification problem is the same.
criterion_transfer = nn.CrossEntropyLoss()

# I only want to adjust the weights of the classifier and I'm using SGD here instead of Adam
optimizer_transfer = optim.SGD(model_transfer.classifier.parameters(), lr=0.001)

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    import time
    
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        start = time.time()
        train_loss = 0.0
        valid_loss = 0.0
        # Train #
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            
            optimizer.zero_grad()
            
            output = model(data)
            
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
             
        # validation #
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        if valid_loss < valid_loss_min:
            torch.save(model.state_dict(), save_path)
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'
                  .format(valid_loss_min, valid_loss))
            valid_loss_min = valid_loss            
        print(f"Total Time: {(time.time() - start)/60:.3f} minutes")
    
    return model

n_epochs = 10

# train the model
model = train(n_epochs, loaders, model, optimizer, criterion, use_cuda, 'model_vgg.pt')

Epoch: 1  Training Loss: 3.181618   Validation Loss: 2.487838
Validation loss decreased (inf --> 2.487838).  Saving model ...
Total Time: 2.524 minutes
Epoch: 2  Training Loss: 2.241434   Validation Loss: 1.812700
Validation loss decreased (2.487838 --> 1.812700).  Saving model ...
Total Time: 2.543 minutes
Epoch: 3  Training Loss: 1.763387   Validation Loss: 1.449733
Validation loss decreased (1.812700 --> 1.449733).  Saving model ...
Total Time: 2.515 minutes
Epoch: 4  Training Loss: 1.548704   Validation Loss: 1.284194
Validation loss decreased (1.449733 --> 1.284194).  Saving model ...
Total Time: 2.528 minutes
Epoch: 5  Training Loss: 1.393220   Validation Loss: 1.159660
Validation loss decreased (1.284194 --> 1.159660).  Saving model ...
Total Time: 2.541 minutes
Epoch: 6  Training Loss: 1.291400   Validation Loss: 1.172395
Total Time: 2.502 minutes
Epoch: 7  Training Loss: 1.248277   Validation Loss: 1.088839
Validation loss decreased (1.159660 --> 1.088839).  Saving model ...
Total Time: 2.535 minutes
Epoch: 8  Training Loss: 1.180103   Validation Loss: 1.078598
Validation loss decreased (1.088839 --> 1.078598).  Saving model ...
Total Time: 2.534 minutes
Epoch: 9  Training Loss: 1.139643   Validation Loss: 1.062964
Validation loss decreased (1.078598 --> 1.062964).  Saving model ...
Total Time: 2.521 minutes
Epoch: 10   Training Loss: 1.093010   Validation Loss: 1.041088
Validation loss decreased (1.062964 --> 1.041088).  Saving model ...
Total Time: 2.541 minutes

# load the model with the best accuracy on validation
model.load_state_dict(torch.load('model_vgg.pt'))

def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass
        output = model(data)
        loss = criterion(output, target)
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        pred = output.data.max(1, keepdim=True)[1]
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

test(loaders, model, criterion, use_cuda)

Test Loss: 0.488900

Test Accuracy: 84% (706/836)

WOOHOO….Pretty Neat

The accuracy has bumped up to 84%, not too shabby, probably can be improved with some more training data and tweaking a few more parameters. So we now have 133 classes, let’s try it out by writing a small algorithm and testing it out.

from PIL import Image

class_names = [item[4:].replace("_", " ") for item in train_data.classes]

def predict_my_breed(img_path):
    # load the image and return the predicted breed
    image = Image.open(img_path).convert('RGB')
    
    in_transform = transforms.Compose([
                        transforms.RandomResizedCrop(224),
                        transforms.ToTensor(),
                        transforms.Normalize((0.485, 0.456, 0.406), 
                                             (0.229, 0.224, 0.225))])
    
    image = in_transform(image)[:3,:,:].unsqueeze(0)
    if train_on_gpu:
        image = image.cuda()
    
    prediction = model_transfer(image)
    prediction = prediction.data.cpu().numpy().argmax()
    
    return class_names[prediction]

file = '/data/dog_images/train/103.Mastiff/Mastiff_06826.jpg'

print ('Dog Breed:', predict_my_breed(file))

Dog Breed: Mastiff

Well that seems pretty accurate, let’s try it out on my pooch

lass = 'post_images/Lass.jpg'

predict_my_breed(lass)

'Cavalier King Charles Spaniel'

Not too bad actually

She’s a poodle mix, but she definitely has those Cavalier ears! I don’t see these types of mixes as prevalent in the training data set, so one way to improve this is to put more poodle mixes in there.

I definitely have gone off on a tangent here as I originally just wanted to identify my dog. The increased complexity of breeds may be helpful if I had other dogs in my photos and wanted only to identify mine, that wasn’t the case but it was a more interesting problem to solve.

Now we have a dog breed classifier at 84%, let’s grab a face detector which can parse out humans from dogs; Udacity has a pretty cool example of using some predetermined face detection. OpenCV’s has an implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. If you download one of the detectors in your directory you can play around as well. I downloaded the haarcascades.

import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline    

face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

people_file = './post_images/mixed_images/Cameron_Diaz_0003.jpg'

img = cv2.imread(people_file)
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# find faces in image
faces = face_cascade.detectMultiScale(gray)
# print number of faces detected in the image
print('Number of faces detected:', len(faces))
# get bounding box for each detected face
for (x,y,w,h) in faces:
    # add bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()

Number of faces detected: 1

png

def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

In the case of Cameron Diaz, it seems to work, so every time we can detect a face we should get a true.

face_detector(people_file)

True

But does it work for dogs?

dog_file = './post_images/mixed_images/Affenpinscher_00003.jpg'

face_detector(dog_file)

False

Now I can write an algorithm, if it detects faces, then I can flag it, else I’ll try to figure out what breed of dog it is. Something like the below is what I’ll likely end up using.

def run_app(img_path):
    ## handle cases for a human face, dog, and neither
    if face_detector(img_path)==False:
        print ('Dog Detected:', predict_breed_transfer(img_path))
    elif face_detector(img_path)==True:
        print ('Human Detected')
    else:
        print ('Not your dog nor a person')

Written on June 4, 2019