Project – Machine Learning and Edge Computation

I recently completed a project that I have been working for about 6 months now. It has come to a point where everything is working and has been integrated together to make a working demo. So let me explain the idea of the project and then I will walk you through how to use all these tools I learned while working on this project.

This is how I usually start the pitch: “Do you how cricbuzz works?” The answer is unequivocally “Yes”. Then I continue and say that “This is basically F1Buzz.You get live updates on a website for a race which is going on. These live updates are generated automatically at the racetrack. Sensors are deployed at the finish line which detect when a car has crossed the finish line, then a camera captures an image of the car as it is crossing the finish line. A raspberry pi controls these sensors and runs the machine learning models which detect the position of the number on the nose of the car and the next model identifies the number. Then the inferences, here I am working with the Car Number, the lap end time and the race start time, are sent to a central server. All the users access this data on the central server for live updates about the track. Since all the data is being processed at the deployment location, a high bandwidth connection is not required to the central server. The system at the end point acts as a sensor which returns the Race Track Updates. If this has to be deployed in a live environment, the images captured will be of very high resolution. To send them over a link, then process them at the server will introduce an element of network bandwidth and the subsequent latency of the network.

This is a link to an article which explains the advantages of an edge computing model very well. One thing

My Setup for a Project Competition. Left-Right: Raspberry Pi with camera, The F1 Car, The live updates on any browser wirelessly.

Now I want to point out some inadequacies of the project:

Now I want to point of the inadequacies of this project:

  • The camera I am using is the raspberry pi camera v1.2 so it has a max frame rate of 90 frames per second at a resolution of 640 x 480 pixels. I am using the camera at this frame rate and it works if the image is perfectly still. But even a small blur renders the captured image useless. A camera with very high shutter speed and a wide opening will be required. There are cameras which can easily handle this requirement.
  • I am running the ML models on the raspberry pi itself, It is taking me about 15 seconds to process (Detecting and identifying the car) each captured frame. I am currently using a yolo v3 model. I have a trained model which is much smaller following the yolo v3-tiny network. If i can get it to run on tensorflow, the computation speed could be reduced by up to 50%. Currently yolo v3-tiny is not supported on the tensorflow implementation I have been using i.e. Darkflow. Another part of this problem is that the raspberry pi is not built for intense calculations. Looking at the available options, I could use Intel’s Movidius Neural Compute Stick. Or I could go the Nvidia route with their Jetson systems, which is their crack at an embedded system containing CUDA cores.
  • Another concern is the latency between when the IR sensor sends the signal to the system for capturing the image and the actual time when the image is captured and the timing is logged. Based on a latency of 300 micro seconds for the IR sensor and about 300 micro second margin for any further latency, the F1 car could have moved up to 6 centimeters from its initial position. To get the exact time, a speed sensor combined with the time from the IR sensor can give the time when the car crossed the finish line exactly.

I want to list out the technologies I considered and learnt during the course of this project.

  • For car detection, I first considered using Google’s inception network and then accessing the car number from a database to display results. I discarded this because each team has 2 cars on the track which have identical designs. They are separated by only the on board mounted camera which is black for one and fluorescent for the other. I still trained a Google’s inception network to identify Renault’s vs Ferrari’s cars.
  • Next, my guide suggested using a YOLO model. I was quite intimidated by it initially, because I had to build the darknet implementation of YOLO on windows which required me to install and understand Visual Studio. Then after 2 days of trouble shooting which was a stupid mistake caused by OpenCV v3.4, I got darknet running. I trained a yolo v3 model to detect the number on the nose of F1 Cars. After the initial step, darknet is very easy to use. But that can be true for a lot of things.
  • Next I had to identify, the number on the nose which the YOLO model detects the position of. I tried using Tesseract, but I later found out that Tesseract is not very good at identifying such text. I was still in the initial stages of exploring machine learning at that time, so I did not consider using transfer learning on the Tesseract model. (I have a different project in which I am using Tesseract to directly read the numbers from an Image, and it is working beautifully with pre-processing.)
  • To identify the numbers, I was going to train a network on the MNIST dataset and then apply processing on the image to split it if there are more than 2 digits and identify the digits in the image. But I found an implementation of number identification done on a different dataset – The SVHN dataset. In that implementation, the author trained the model in a brute force way to identify the digits. The model returned 5 numbers (Max length of digits in SVHN Dataset) for each image. I decided to try this approach first, before building an OCR pipeline for the digits. I took all the images from the SVHN dataset with 1 and 2 digits and trained my convolutional neural network model on this subset of images. From a total of around 700k images, I compiled a dataset of around 130k images. After training, my model achieved an accuracy of about 92% on the SVHN dataset. I then tried this model on my F1 car numbers, and I got an accuracy of about 60%(for 1 and 2 digits). I applied transfer learning and after that, I got an accuracy of around 85% percent on the final model for F1 Car number images.

I combined all this and sprinked in some HTML, PHP, MySQL. All this was a topping on my “Raspberry Pie” to finally get the demo ready. I will be making more posts about this project and the detailed explanations for each part.

Okay Bye! See you soon, I don’t know how soon though.

(I will add an image of the result from the Pi Camera tomorrow in an edit)