The first version of Commute Guardian uses existing machine learning components as tools to solve a problem. And on top of these components, algorithms run to assess your risk and choose the course of action (flash, honk, or both). This system has been proven in internal testing to be effective at detecting and responding to risk.
However, it is akin to some of the initial versions of object detection, which ran an algorithm on top of image classification.
The final version will go one step further on the machine learning, and eliminate the algorithm-on-AI system, to train the machine-learning/AI directly based on user's experience.
First, some background:
The algorithm-on-AI approach is typical of the machine-learning/AI industry when new applications are implemented. The AI gives you a capability you never had before, and the human brain goes 'oh if I just wrap this logic around it, I can do this!' And off you go. A great example of this is object detection itself. In short, it used to be done by this 'algorithm-on-AI' approach, and netted 1 frame ever 6 seconds, by using an image classifier algorithmically across an image. Using a complete AI solution netted 60FPS. For full details on this, see one of the talks on the first implementations that did away with this 'algorithm-on-AI' approach, which netted a 100x improvement, with no real loss in performance, here
A brief on image classification vs. object detection:
- Classification just tells you what's in an image
- Object Detection tells you what's in an image, and where it is
So the first Object Detectors were simply using an object detector, and algorithmically breaking down the image into little tiles, and passing those tiles through an image classifier, to try to figure out where the specific object is in the original image. To get good position, you had to pass a TON of sub images through the classifier, which made it REALLY slow.
The full-AI Object Detectors don't do this, they use AI to figure out where the object is in the image AND what that object is. And they're not only hundreds of times faster (1/6 FPS to 60 FPS), they're also better at it.
So the first version of the bike product, is kind of akin to the old-school (circa 2016) way to doing object detection. And it actually works really well, but a full-AI approach will work even better, and here's why:
The algorithm that is on top is only as smart as well, we are. So we do a bunch of testing on the thresholds to use for distance/speed/etc. to determine when to honk, when to strobe, and how long to do both/etc. All this works well for helping to prevent a crash - for knowing when you're at risk - and it will be core to the first version of the product. It will help save lives. However, the full-AI approach allows things to be discovered from data (the beauty of modern machine learning) that we can't think of. So imagine this example (and there are probably others):
- A user uploads and tags/labels a video showing a texting driver, who in this case poses no risk.
Our base algorithm only works on when you're at risk. So it doesn't do anything with this. With the full-AI approach, instead of using algorithms (which leverage data produced by ML/AI) to assess risk - the whole system is trained with input data (video data, including depth), user labels (e.g. the driver of this car was texting), and recommended actions (e.g. “meep meep” to get the texting-driver’s attention).
In this way, when the AI is trained end-to-end (input video to output action) it can start to learn to pick up on things we may have not been able to think of; to start to look for things that we couldn’t think to tell it to look for. This happens all the time in AI - and is the reason it is transforming every known industry and why MIT Technology Review famously wrote ‘No industry can afford to ignore Artificial Intelligence’.
So it enables the system to be able to respond (customizable by the user, of course) to things that don’t even put you in risk. For example, it can learn to pick up subtle hints that reveal the driver is drunk, or hints that reveal the driver is texting, or hints that the driver is agitated, and so on.
And then, as a community, we can train the AI on what to do in response to these conditions (e.g. alert the driver that there’s an agitated driver close, or meep-meep at a driver who’s texting, etc.).