Author: Praneeth Vepakomma, Data Scientist, Public Engines
The scientific problem of being able to make predictions has raised the interests of researchers for centuries. Solutions that attack this problem have found all encompassing applications within various fields of societal, applied and theoretical interests. Over this period, the complexity of these predictive models gradually increased and so did the predictive accuracy and generalizability of these solutions improve with time. Stemming from the early work of Gauss in the early 1800’s to Andrew Ng’s present day experiment at Stanford consisting of autonomous helicopters that learn to fly- the problem of prediction has truly moved from the simple problem of fitting a straight line, to learning complex relationships from data.
R&D at PublicEngines
At PublicEngines we have a committed R&D team focused on this intersection of developing advanced mathematical models internally for predicting crime that also ‘learn’ from data and we are pleased to have announced our newest product, CommandCentral Predictive that culminates as a result of these dedicated research efforts.
Mathematical Modeling of Crime
The mathematical model that I have developed in-house along with my team exploits specific behaviors that are markedly inherent to crime-incident data. These behaviors include near-repeat victimization, long-term patterns, transient/short-lived patterns and interactions within crime-incident data that our model understands while separating signal from the noise thereby raising the statistical confidence levels around our predictions. These dynamic sets of crime-specific behaviors are measured through our model while not just separating one characteristic from another but also while accounting for the dependencies between them to mathematically understand the inherent predictive characteristics of the patterns of crime. We also were cognizant of some inconsistencies that crop up through the use of some of the existing academic literature around similar problems, and improve upon these as well within our patent pending system.
Focus on Models that Learn
The automated learning component of our model is another important aspect that brings in generalizability to our predictive engine where it guarantees a level of robustness, over new unseen data, when the models are deployed in reality. The scientific field of building various domain specific models that also have the ability to learn from data is an integral part of this field called statistical machine learning. This area began to gain early momentum, when scientists began to build mathematical models that mimic neural networks within the human brain, starting from late 1940’s. An early learning-system breakthrough occurred when scientists in the late 1980’s were able to develop handwriting recognition systems. The field grew leaps and bounds from there on, thereby adding analytical intelligence to various use cases. Today’s E-mail Spam Classification Systems, Voice Recognition Systems, Computer Vision Systems, Facial Recognition Systems, Document Classification Systems are just a few examples of the successes of machine learning. The main reason to focus on a learning component is because in reality, a model that works great over the data that you use to build and train the model on, wouldn’t always be replicable in new instances of these datasets when deployed in real-life scenarios. This happens primarily because the model could learn non-generalizable intricacies within the training dataset, which do not otherwise translate into practical results over future data, that the model would face after deployment. This is the reason; we rigorously train our patent pending crime-specific statistical model, and as well have a learning component comprising of an ensemble of multiple models to keep it robust in real-life deployment. This allows for our model to separate the predictive characteristics of future crimes, down to a high level of tactical/actionable granularity.
Field-testing & Evaluation Metrics
We have rigorously tested our models against the most traditional methodologies like ‘Hotspotting’ which are heat-maps generated using historic crime-incident data. In order to have a scientific evaluation, we quantify the performance of existing Hotspotting methodologies by overlaying a grid over the hotspots and measuring the success rate of our predictions in comparison to the hotspot. The number of days required for an evaluation ranges somewhere between a minimum of 80-120 days based on the dataset in order to reach a statistically significant comparison. We conclusively observe, that our patent pending model performs about 2.7 times better than the widely used hotspot.
Positive Impact of Actionable Patrol
This level of focused tactical information would help reinforce ‘directed, actionable patrol plans’ and increase ‘resource-efficiency’ so that agencies can use it in their law enforcement efforts to positively impact their communities through predictive policing. This technology unlocks tactical intelligence from your RMS and closes the analyst/officer gap by efficiently improving operational decision-making. From the perspective of the tech-industry as well, this is absolutely inline with our goal of continuing to reinforce officers with technology that provides tactical and directly actionable information in order to assist them in continuing to do their best.