Building a simple application like this gave me a lot of knowledge about building ML models and how to use swift. A bit of information about how to collect data. What tools I used and what helped me write this simple application that detects road signs and reads values based on OCR. General view on how to make an app like this.

The mobile application was supposed to be an add-on to the python application. It was also about checking how to make such an application, how it will work, and what possible problems will arise during its development. The project involved creating the main application using Python. Initially, the application was supposed to use the model built with TensorFlow, since the conversion of the TensorFlow model to MLmodel did not go very well, it was necessary to build a new model based on Xcode tools.

Assumptions

The application recognizes 8 classes of road signs
The application uses Optical Character Recognition (OCR) to read values from characters

Dataset

Collected the data by taking photos and videos on the way to university or going somewhere. More than 500 photos were used, then the road signs were labeled with IBM Cloud Annotations. After That, I augmented the dataset (adding blur and noise) with Roboflow to over 1,300 photos.

Statistics

Did you know that

The shortcut road sign (turquoise border). The sign tells about a shortcut to the streets of Ratburana

One of the label used for this dataset is the shortcut road sign. The sign is one of a kind. I have not encountered such a sign anywhere else.

Training the model

With the dataset ready, it’s time to train the model. As I described earlier, part of the team focused on training for TensorFlow and later on trying to convert to mlmodel, which can be used with Swift. When that didn’t bring the expected results, I started training the model using the tools from Xcode – Create ML.

The first model showed that some changes were needed. If there is a large amount of a given road sign in the set, it can significantly disturb the recognition of the others. For example, speed limit versus height limit – signs are very similar to each other. Is necessary to balance pictures of each class to avoid this problem. I mean that, to add more data to the smaller class, not delete data from larger classes. An additional problem is the recognition of a given object indeed. Successful recognition might depend on the light intensity or the angle at which the object is located. You can fix it a bit by augmentation (e.g. with roboflow) or simply enlarge the data set to make it work better (photos from different angles, from different distances, at different times of the day).

Confusion Matrix on testing

The darknet-yolo model was used with 11k iteration. You can download the model here. I do not know how this model can work with road signs other than Thai.

iOS App

Swift is quite easy when it comes to some ready-made solutions. The assumption was simple. Detecting an object, crop a road sign, and scan that road sign by OCR to read the text. The implementation of detecting an object in a photo can be easily learned from a Polish YouTuber (in English) or documentation.

After detecting an object, the method returns its position on the photo and its size – let’s call it parameters. Then the parameters of the detected object will help us to crop out the area (of detected road sign) that we want to scan by OCR. Simple method and it’s ready!

    func cropImage(image: UIImage, rect: CGRect) -> CGImage {
        let croppedCGImage: CGImage = image.cgImage!.cropping(to: rect)!
        return croppedCGImage
    }

Then new cropped photo will be processed. Namely, we will search for characters (numbers or letters) on the sign; we will use OCR now.

Vision Framework does not support Thai yet. However, reading the Latin characters or Arabic numbers is not a problem. You could fix this by training your model. There are also ready libraries for the Thai language, but those are written for old SWIFT versions and their conversion to the new version is not worth it. The UI of the app could be improved :). The Vision Framework will support the Thai language soon.

screenshot of app

I think that such small projects are quite a good to learn many fields of technology and you can learn a lot (even parallel programming to improve application performance).

You can download the application here.

Testing the model

I would be happy if you would like to support my blog. If anyone would like to learn how to make such an application step by step, feel free to comment and give me feedback.