A Best-Practice Approach to Machine Learning Model Development

You’re probably familiar with the old axiom: sometimes you have to slow down to go fast. This is especially true with the machine learning development process that’s at the core of Enterprise AI implementation. Machine learning model development needs to be done carefully and methodically if it’s going to bring the returns you should expect from your investment. We see six distinct steps to the process:

  • Determine Business Objective
  • Design Solution
  • Create Model
  • Evaluate Model
  • Deploy Solution
  • Manage Continuous Improvement

These steps occur in order, and then repeat.

1.     Business Objective

Clearly defining business objectives for a machine learning (ML) project may seem like an obvious first step, but you’d be surprised how many projects kick off with the vague goal of “finding insights” in data that’s been accumulating for years. Machine learning models work at a very granular level. Broadly stated objectives such as “reduce manufacturing costs” are not specific enough. They must be broken down into more specific tasks such as “reduce equipment failures” or “improve operator productivity.” It’s also important to quantify the amount of improvement you hope to achieve, then translate that goal into an actual ROI so informed decisions can be made about prioritization and resource allocation.  

 

2.     Solution Design

ML projects are often treated like isolated lab experiments, excluding the important step of designing a complete, integrated business solution interface. I recently listened to a conference panelist describe a marathon client meeting that took place after a model was completed. The meeting’s objective was to use the new model to identify the best cities for expansion. Client representatives and the data scientist who developed the model sat around a table all day while the data scientist tested various scenarios and answered questions. The client was satisfied with the outcome, and the project was considered a success.

Now imagine an alternative scenario. The data scientist creates a simple user interface that allows the client to experiment with different scenarios and parameters on her own. The client can consult whoever is affected by the business decision, with no time constraints, and without the need for manual manipulation of the model that would require the on-site presence of the model’s developer. The model could be used on an ongoing basis to track progress and monitor changes. Better yet, the outcomes of using the model’s recommendations could be fed back into the model to continuously improve its output for future expansion projects.

A complete machine learning solution should include a strategy to integrate the model into existing business processes. It should account for end-user interface and training requirements, model quality requirements, model update frequency, and an outline of the release process itself.

 

3.     Model Creation

Model creation is typically the longest stage of the development process. The goal in this step is to achieve a high degree of model accuracy, as close to 100% as possible. Three primary components determine the accuracy of machine learning models: the fit of the algorithm, the completeness of the feature set, and the sufficiency of training data. The modeling process continues by iteratively making improvements to these three factors until the required accuracy is achieved, or progress has stalled.

The first step to model creation is to select an appropriate algorithm. The algorithm is the procedure that’s executed on the training data to create – or train – the model. There are literally hundreds of machine learning algorithms available to data scientists, and new ones are created every day. The correct algorithm for a given machine learning problem is the prerequisite for a good model that can then become a good business tool. 

The second component, which is crucial for model accuracy, is the completeness of the feature set. Features are the characteristics, or parameters, in the data that influence the model. The process of identifying features is called feature engineering.The feature set tells the model: “These are the things that make the most difference for this problem.” It is the algorithm’s job during training to figure out how to process these features in the training data to get to an accurate, or “correct,” result. 

The accuracy of the first model created with the selected algorithm is recorded as the baseline accuracy. With traditional ML models, even when algorithm selection is done correctly, baseline accuracies can be low – in the 70% range. Many algorithms have adjustments or settings built into them called hyper-parameters. Sometimes accuracy improvement is possible by simply manipulating these values. As the model develops, the optimum values for the hyper-parameters can shift, making it important to periodically revisit them during the modelling process. The time required for this experimentation is minimal.

Developing the feature set can take quite a bit longer. For example, a model to predict home prices would use features including square footage, neighborhood, and market inventory to get a ballpark estimate of perhaps 70% accuracy. Sometimes additional features are fairly obvious and tangible, such as the number of bedrooms, the number of bathrooms, and whether there is a garage. Assume that each of these three features adds another 5% accuracy, resulting in a model with 85% accuracy. To get to a higher accuracy, the model has to dig a little deeper. A nice view and the age of the structure might increase accuracy by an additional 2% for each feature, for a total of 89%.

At this point somebody who knows the business better – a realtor in this case – is needed for additional ideas. The season of the year and the school district are also considered important to some buyers. From each of these features the model gains another 1% accuracy, bringing total accuracy to 91%. If the expert runs out of ideas before required accuracy is reached, peripheral businesses might provide a different perspective. In this case, a mortgage broker or title company officer might be able to contribute feature suggestions linked to interest rates and city ordinances. After all these ideas have been exhausted, seemingly unrelated data, sometimes referred to as alternative datasets, can sometimes get a model over the finish line to the required level of accuracy.

The quantity of training data is the third component of model accuracy. The accompanying chart depicts the typical relationship between model accuracy and the amount of training data. There is a point at which the accuracy reaches a maximum, then levels off. The trick is to acquire enough training examples to get to or very near maximum accuracy. It is common for training data to be scarce, particularly when an algorithm requires labeled data, as the vast majority do. Sufficient training data can be expensive and difficult to obtain, but fortunately many traditional models require low to moderate amounts to reach required levels of accuracy. 


 

4.     Model Evaluation

When all the previously described design and planning stages are done well, the model evaluation step becomes a checkpoint. After weeks or months in the experimentation phases of model development, the team needs to reorient on the business aspects of the plan in order to maintain focus on the ultimate business goal.

A team that reaches this stage is now tasked with reviewing model outcomes, assessing the impact of any changes, evaluating risks, and making a go/no-go deployment decision. If progress has stalled prior to reaching required accuracy, the task is to assess the problem, decide whether to revisit the model design or some aspect of the solution, and even reconsider the original business objective.
 

5.     Solution Deployment

Deployment will vary greatly by application, but ideally model deployment is a matter of executing the steps outlined in the deployment plan. Remember that even the best machine learning solutions will not have the desired business result without the necessary cultural and behavioral changes. Successful adoption of Enterprise AI is equal parts technology and people.
 

6.     Continuous Improvement

The primary activities in the continuous improvement phase of a machine learning project can be deceptively simple: 

  • Track and report progress against goals
  • Mine insights and learnings
  • Refine and update goals, solution, and/or models

In reality, there is nothing simple about them. Since potential rewards and spinoffs are not fully known up-front, continuous improvement is the exciting and infectious part of the process.

ML model development is never really “done.” By definition, a model “learns” and gets more accurate over time. The best ones even learn well beyond their original scope. These models can be mined for insights that seed ideas for other related business process improvements, or drive improvements to the ML solution itself. A sure sign of a healthy ML project is one that seems to be multiplying as groups are identified, separated, and each given a dedicated model of its own to be further optimized for the unique characteristics of a group. 

The evolution of the recommendation engine developed by Netflix is an excellent illustration of how ML solutions can become increasingly fragmented and specific over time, leading to increasingly more accurate and effective results. 

The earliest Recommendation Application model at Netflix was based on end-user-reported preferences, expressed by adding movies to their queues. As the business model shifted from DVDs to online streaming, end-users were less willing to provide ratings, so Netflix switched to actual online activity as input to their Recommendation model. Netflix tracked what end-users played and searched for, browsing patterns and behaviors, as well as times, dates, and devices used for viewing. Originally the Recommendation model fed one account per household, and the algorithms tried to recommend something for everyone. Gradually, Netflix introduced new categories on an individual user’s home page to segregate the recommendations into groups, such as different genres and new releases.  Each category had its own dedicated algorithm.

Then the ability to identify individual users within a household led to a whole new level of homepage personalization. As early as 2008, Netflix reported that an impressive 60% of the movies in end-user queues were driven by its own recommendations. The accuracy steadily increased to 75% by 2012, and an incredible 80% in 2017.

One of the hardest things about the transition to Enterprise AI for many executives is the uncertainty, ambiguity, and unpredictability of early ML model development. It is necessary to hang on through the first few projects, give the unwavering support and patience that’s required to make this transformative leap, and have faith that it’ll be worth it in the end.

Finally, as part of milk+honey's mission to bridge the gap between data science and business, I'm writing weekly about enterprise AI integration. Feel free to send any questions or topics of interest our way and I'll include them in the line up!

Next Up:  But What Does It Mean!? How Enterprise AI is Going to (Further) Revolutionize Business