Build an Image Recognition App with CoreML in Swift – XSTutorials

Build an Image Recognition App with CoreML in Swift

PLEASE NOTE: This tutorial has been written using XCode 10 and Swift 5

In this tutorial, we’ll build a simple but still great application that will recognize a picture and display its name on a label using CoreML. CoreML is the Machine Learning framework introduced by Apple during the WWDC 2017 conference. You can use this framework to build more intelligent Siri and Camera features. CoreML is a great framework to get you introduced to machine learning.

Start by creating a new Single View Application with Xcode, name it ImageRecognition, set its language as Swift and save it on your Desktop.

Enter the Main.storyboard file and drag a Label, an ImageView and 2 Buttons in the ViewController. The Label will be used to show the dominant object’s name shown in a picture that you captured by the camera or picked up from the Photo Library. The ImageView will display the taken image and the Buttons are needed to open the device’s Camera and the Photo Library.

Arrange the dragged Views into the Controller as you want, the final result should look like this:

The App’s design in the Storyboard

Split the Editor area by opening the Assitant panels and connect your Views and IBActions in the ViewController.swift file (if you don’t know how to do that, read this helpful article and jump to the Declare IBOutlets and IBActions paragraph). Declare the Label outlet as myLabel, and the ImageView’s outlet as myImg. Then name the camera and library button’s actions as “cameraButt” and “libraryButt”.

Before getting into code, let’s download a free CoreML model offered by Apple. Go to https://developer.apple.com/machine-learning/models/ and choose MobileNetV2 (you’re free to download the Resnet50 or SqueezeNet models too, personally I think that the MobileNetV2 is the best one though). Once you’ve downloaded the model’s file on your Desktop, drag it inside the file’s list menu in Xcode and make sure to check the Target name’s option in the popup window that will show up and let you confirm the file’s copy.

All right, now add the ImagePicker and NavigationController delegates in your Class, next to the ViewController’s name:

class ViewController: UIViewController, UIImagePickerControllerDelegate, UINavigationControllerDelegate {

The app will use them to either show the native camera or the photo library. Inside the cameraButt() function, paste this code:

if UIImagePickerController.isSourceTypeAvailable(.camera) {
    let imagePicker = UIImagePickerController()
    imagePicker.delegate = self
    imagePicker.sourceType = .camera
    imagePicker.allowsEditing = false
    present(imagePicker, animated: true, completion: nil)
}

In the libraryButt() action, paste this code:

if UIImagePickerController.isSourceTypeAvailable(.photoLibrary) {
    let imagePicker = UIImagePickerController()
    imagePicker.delegate = self
    imagePicker.sourceType = .photoLibrary
    imagePicker.allowsEditing = false
    present(imagePicker, animated: true, completion: nil)
}

It’s time to call and setup the didFinishPickingMediaWithInfo delegate function from the ImagePicker class, so place this code below the closure of the libraryButt() function:

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
        if let image = info[UIImagePickerController.InfoKey.originalImage] as? UIImage {
            
            dismiss(animated: true, completion: nil)
            myLabel.text = "Analyzing image..."
            
            UIGraphicsBeginImageContextWithOptions(CGSize(width: 224, height: 224), true, 2.0)
            image.draw(in: CGRect(x: 0, y: 0, width: 224, height: 224))
            let newImage = UIGraphicsGetImageFromCurrentImageContext()!
            UIGraphicsEndImageContext()
            
            let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary
            var pixelBuffer : CVPixelBuffer?
            let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(newImage.size.width), Int(newImage.size.height), kCVPixelFormatType_32ARGB, attrs, &pixelBuffer)
            guard (status == kCVReturnSuccess) else {
                return
            }
            
            CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
            let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer!)
            
            let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
            let context = CGContext(data: pixelData, width: Int(newImage.size.width), height: Int(newImage.size.height), bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: rgbColorSpace, bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) 
            
            context?.translateBy(x: 0, y: newImage.size.height)
            context?.scaleBy(x: 1.0, y: -1.0)
            
            UIGraphicsPushContext(context!)
            newImage.draw(in: CGRect(x: 0, y: 0, width: newImage.size.width, height: newImage.size.height))
            UIGraphicsPopContext()
            CVPixelBufferUnlockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
            myImg.image = newImage
            
            // Core ML
            guard let prediction = try? model.prediction(image: pixelBuffer!) else {
                return
            }
            
            myLabel.text = "(prediction.classLabel)."
        }
        
    }

The code above may look hard to understand, but it basically performs the following operations:

  • Check if the picked image exists
  • Create a new UIImage based on the size of our MobileNetV2 CoreML model (if you select it from the left menu, you’ll see its Model Evaluation Parameters and notice that the input images rea 244x244px)
  • Apply and process pixel buffer to the image
  • Scale the UIImage to for the model’s size
  • Instantiate a prediction of the dominant object in the picture and prints its list of names in the Label (yes, an object may be classified with more than one name, ex: a computer’s keyboard may generate “keyboard”, “keypad” and “computer’s keyboard” words)

Your app is ready! Just make sure to plug your device on your Mac (you cannot use the Simulator since it doesn’t support Camera), and hit the Run button in Xcode. Here are some screenshots of how the application works:

The App’s layout
A taken picture
Funny, I’m not a barber, but the app still got the chair 🙂

Conclusion

That’s all for this tutorial, you have learned how to build a small app that uses CoreML to predict and show object names in a photo.

Hope you enjoyed this article, feel free to post comments about it. You can also download the full Xcode project of this tutorial, just click the link below:

Download the Xcode project

Buy me a coffee - XScoder - thanks for your support
Your support will be highly appreciated 😉