Abstract: With the intervention of social media and internet technology, people are getting more and more careless and distracted while driving which is having a severe detrimental effect on the safety of the driver and his fellow passengers. To provide an effective solution, this paper put forwards a Machine Learning model using Convolutional Neural Networks to not only detect the distracted driver but also identify the cause of his distraction by analyzing the images obtained using the camera module installed inside the vehicle. Convolutional neural networks are known to learn spatial features from images, which can be further examined by fully connected neural networks. The experimental results show a 99% average accuracy in distraction recognition and hence strongly support that our Convolutional neural networks model can be used to identify distraction among the drivers.
Conclusion: Deep learning using Convolutional Neural Networks  has become a hot area in Machine Learning research, and it has been extensively used in image classification, voice recognition, etc. In this paper, we use Deep Convolutional Networks for detecting distracted drivers and also identifying the cause of their distraction using the VGG16  and VGG19  model. The above results suggest that the methods discussed in this work can be used to develop a system using which distraction while driving can be detected among drivers. The model proposed can automatically identify any of the mentioned 10 classes of distraction and identify not only basic distraction but also their cause of distraction. With an accuracy of more than 99%, the mentioned system was shown to be efficient and workable. The proposed system can be a part of some Driver State Monitoring System which will effectively monitor the state of the driver while he is driving. Driver state monitoring has been becoming increasingly popular these days and many automobile giants have started adopting such systems as a methodology to prevent accidents. These systems, when installed inside vehicles will raise warnings whenever the driver gets distracted, thus trying to prevent any accidents due to distraction from the driver. Also in this work a significant amount of training time has been shown to be reduced. When pre-trained weights from ImageNet  were not used, the training time increased by around 50 times for both VGG16  and VGG19 . A graphical representation of time elapsed is depicted in Fig. 15. This drastic reduction in training time was achieved without diminishing the accuracy of our classification models. In future work as an extension to this work, more categories of distraction can be brought in. Even considering certain specific scenarios, which were not targeted in the present work, such as detecting drowsiness among drivers may also provide an opportunity to widen the scale of the work and build a more efficient system.
Abstract: The standard architecture of synthetic aperture radar (SAR) automatic target recognition (ATR) consists of three stages: detection, discrimination, and classification. In recent years, convolutional neural networks (CNNs) for SAR ATR have been proposed, but most of them classify target classes from a target chip extracted from SAR imagery, as a classification for the third stage of SAR ATR. In this report, we propose a novel CNN for end-to-end ATR from SAR imagery. The CNN named verification support network (VersNet) performs all three stages of SAR ATR end-to-end. VersNet inputs a SAR image of arbitrary sizes with multiple classes and multiple targets, and outputs a SAR ATR image representing the position, class, and pose of each detected target. This report describes the evaluation results of VersNet which trained to output scores of all 12 classes: 10 target classes, a target front class, and a background class, for each pixel using the moving and stationary target acquisition and recognition (MSTAR) public dataset.
Conclusion: By applying CNN to the third stage classification in the standard architecture of SAR ATR, the performance has been improved. In order to improve the overall performance of SAR ATR, it is important not only to improve the performance of the third stage classification but also to improve the performance of the first stage detection and the second stage discrimination. In this report, we proposed a CNN based on a new architecture of SAR ATR that consists of a single stage, i.e. endto-end, not the standard architecture of SAR ATR. Unlike conventional CNNs for target classification, the CNN named VersNet inputs a SAR image of arbitrary sizes with multiple classes and multiple targets, and outputs a SAR ATR image representing the position, class, and pose of each detected target. We trained the VersNet to output scores include ten target classes on MSTAR dataset and evaluated its performance. The average IoU for all the pixels of testing (2420 target chips) is over 0.9. Also, the classification accuracy is about 99.5%, if we select the majority class of maximum probability for each pixel as the predicted class.
ABSTRACT In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small ( 3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision
CONCLUSION In this work we evaluated very deep convolutional networks (up to 19 weight layers) for largescale image classification. It was demonstrated that the representation depth is beneficial for the classification accuracy, and that state-of-the-art performance on the ImageNet challenge dataset can be achieved using a conventional ConvNet architecture (LeCun et al., 1989; Krizhevsky et al., 2012) with substantially increased depth. In the appendix, we also show that our models generalise well to a wide range of tasks and datasets, matching or outperforming more complex recognition pipelines built around less deep image representations. Our results yet again confirm the importance of depth in visual representations.