Although the principle of machine learning is not new, it is currently enjoying a surge in popularity within State and Local Government (SLG). There are three main reasons for this: firstly, the availability of large quantities of data necessary for the applications and training (“big data”). Secondly, we now have the huge computing power required, especially in the cloud. And third, a range of open source projects have led to algorithms being accessible to more or less everyone.
Supervised vs. Unsupervised Machine Learning
In machine learning, there are two types of learning: supervised and unsupervised.Supervised learning is learning in which data has labels. Labels can be thought of as an end result. For example, if we had an opioid dataset, with each data point being one Narcan deployment, the label could be whether the Narcan is successful or unsuccessful.
Unsupervised learning, we would not have those labels to rely on. Supervised learning can take the form of regression or classification. In regression, you try to predict the output value of an input while in classification, you try to predict a particular class that an input belongs to. Unsupervised learning mostly comprises clustering algorithms; you plot the data and try to group different parts of the data into clusters to find relationships and patterns within the data. So how does it work? It’s really just a computer doing lots of computations, plugging numbers into many different equations and the results is recommended approaches to solving for a particular problem.
Artificial Intelligence has the ability to solve many real-world problems. For example, New York Presbyterian, a major academic health system, is working on machine learning analytics to combat the opioid crisis.
Machine learning offers deep pattern recognition which gives lawmakers and agency heads the data to request appropriate amount of funding. Machine learning algorithms can predict problematic usage and potentially even identify those citizens at the highest risk of overdose.
Where to Get Started with Machine Learning
So where does one start?In day-to-day machine learning and the quest to deploy the knowledge gained, we typically encounter these three main problems (but not the only ones).
Data Quality – Data from multiple sources across multiple time frames can be difficult to collate into clean and coherent datasets that will yield the maximum benefit from machine learning. Typical issues include missing data, inconsistent data values, autocorrelation and so forth.
The reason why these are important is that these issues affect the statistical properties of the datasets and interfere with the assumptions made by algorithms when run against these dirty data sets. This results in reduced effectiveness of the models and thus the potential return on investment.
Business Relevance – While a lot of the technology underpinning the machine learning revolution has been progressing more rapidly than ever, a lot of the application today occurs without much thought given to business value. For example, a customer churn model built with deep learning techniques might provide fantastic prediction accuracy but at the expense of interpretability and understanding how the model derived the answer.
The business may have originally wanted a high accuracy model as well as an understanding into why customers churn. The original objective may have been to gain behavioral insight and improve interactions with the customer rather than making critical decisions based on trust in a black box of code.
Operationalizing Models – This is business relevant in that once models have gone through the build and tuning cycle, it is critical to deploy the results of the machine learning process into the wider business. This is a difficult bridge to cross as predictive modelers are typically not IT solution experts and vice versa.
Closing this gap in disparate skill sets is necessary to exploit the benefits of machine learning to be reused by business applications downstream. For example, think of an inbound customer feedback system routing complaints and feedback to the correct channel and consultant with the explicit aim of churn prevention. This could be achieved through real time interaction with an NLP + neural network pipeline all wrapped into a neat REST API.
The ability to quickly ingest petabytes of data into your data lake and transform the data for distribution into a variety of systems across SLG is imperative. Providing State and Local Government the tools necessary in a solution architecture that is both credible and affordable is what Talend has to offer.