Artikkelin kirjoittaja on Data Scientist Pekka Tiusanen Bilotilta, jonka kanssa Louhia yhdistyi kesäkuussa 2018. Jatko-osa seuraa ensi viikolla.
ABOUT HIT RATE FORECASTING
Challenges in the sales process and management are common to many organizations. Improved data availability and machine learning algorithms provide means to forecast sales opportunity hit rates, which brings forth two interesting ML use cases. On the sales management level, keeping track of the sales funnel is essential and expected profit from the current sales pipeline is definitely of interest. On the contrary, individual salespersons have to be aware of critical price points in order to optimize profitability. Machine learning can answer such questions as “How much money do I currently have in the sales pipeline?” and “What price should I offer in this situation?”.
DATA ACQUISITION & MANIPULATION
The forecasting results will be only as good as your model and data. Opportunity hit rate is in the least to some extent dependent on price, lead type and margin. Hence, CRM attributes are worthwhile to consider in any hit rate engine. However, relevant information may reside in various data sources, such as ERP, IoT and social media. In large organizations, it is often most convenient to import all the forecasting attributes from an existing data warehousing architecture. Having finished the compilation stage, one may have to spend some time on joining tables and transforming data to form a comprehensive set of relevant features with good data quality.
1. The Process of Adopting a Hit Rate Engine
After the data has been cleaned, some data manipulation might still be required. For example, depending on the method and analysis platform, categorical features may have to be separately converted to dummy variables. In the modelling stage, each categorical feature forms as many binary variables as there are factor levels.
One may also want to combine classes within some of the features to achieve less specific categorizations. For instance, all the lead types that relate to marketing could be combined, leaving the rest of the lead types untouched. This type of considerations are ad hoc and they may improve or deteriorate forecasting performance depending on the chosen methodology and data.
Moreover, there is often a logical order to some of the categorical model features. Such ordinal features are converted into numeric form by applying the associated logical order. For example, a variable with values “satisfactory”, “better than satisfactory”, “good”, “very good” and “excellent” would naturally convert to a scale from 1 to 5.
For some features, it is important to find the optimal level of detail, often referred to as granularity in data warehousing jargon. This applies to, for instance, geographical location and organizational data. A good rule of thumb is that all the classes of each feature should have enough examples to justify inclusion. If only a few sales were recorded at a freshly opened sales office, including this sales office as a class in the training data probably displays an excessive level of detail. This type of a situation would allow two options: (1) either substitute “sales office” with a less specific feature, such as “sales region”, or (2) replace the values of offices having too few recorded sales with NAs.
Finally, it is often sensible or even necessary to derive new variables from the set of ML features extracted, transformed and loaded (ETL). For example, if margin is not provided, it can be calculated based on cost and price. It is also common to use some type of ratios as variables. One should also verify that the chosen statistical model is not sensitive to the number of variables or correlations between them. Truncating the number of features at some point depending on the purpose of your analysis and the chosen machine learning algorithm should be considered. It is sometimes useful to reduce dimensionality of the data by applying principle component analysis, for example.
The gradient boosting algorithm is based on steepest descent and loss function minimization. The method first fits training set mean or some other rudimentary model to the target series which, in this case, consists of “won” or “lost”. A decision tree is fitted to the resulting error terms of this crude approximation. Another tree is fitted to the residuals of the first tree and iterations are continued until the chosen number of trees has been reached. Gradient boosting thus produces a forest which consists of individual trees. It makes sense to use regression tree based gradient boosting instead if your dependent variable is continuous. The process of fitting a gradient boosting model minimizes mean squared error analogous to loss function in mathematical terms.
The resulting statistical model is just an ensemble of trees generated based on your training data. The training series can be formed by taking a 70-80 % random sample of the total observations. Passing an individual opportunity to the forest-like model yields a hit rate forecast. One opportunity is just a row with the corresponding model feature values. The feature values of this particular opportunity determine the forecasting outcome of each tree. All the trees shift the resulting forecast up or down.
The forecasting model is tested against the validation set which typically comprises 20-30 % of total observations. Prediction accuracy can be determined by benchmarking the model with a confusion matrix. The matrix indicates how many lost (0) opportunities were correctly identified lost and, on the contrary, the number won (1) opportunities correctly forecasted.
Receiver operating characteristic (ROC) analysis is another way to benchmark classification models. The ROC curve displays the true positive rate (TPR) on the y-axis, whereas the false positive rate (FPR) is reflected on the x-axis. The true positive rate is the share of won opportunities correctly identified as such in the validation set. Furthermore, the false positive rate is for the proportion of lost opportunities incorrectly predicted won.
Estimated raw probabilities are used for classification scoring. The ROC curve illustrates how the TPR and FPR values behave as the classification threshold is changed. The intuition is that model performance improves in terms of detecting won opportunities along with a decline in the accuracy of catching lost opportunities as a consequence of shifting the threshold.
Possible overfitting of the model can be assessed by reviewing the ROC curves of training and validation sets. Any statistical model is overfitting if there is considerably more area under the curve in the training set compared to validation. Overfitting means that a statistical model is too specific so that it overestimates the associated predictive power. Complex statistical models are prone to overfitting. Hence, one should avoid using an excessive number of trees and tree depth as gradient boosting is employed.
2. Forecast Benchmarking
FROM FLASHCARDS TO TRULY DATA-DRIVEN SALES FUNNEL
Now that the hit rate engine is up and running, it is time to feed it some opportunity data! The model predicts the probability of acceptance for each individual opportunity fed to the algorithm, which allows better approximation of the sales funnel. The expected aggregate revenue can be estimated by multiplying predicted hit rates by the corresponding opportunity prices and summing up the results. Assuming that margin for each opportunity is known or derivable, it is possible to produce profitability estimates as well. The forecasting results can be integrated to the current BI reporting solution, which enables enhanced sales management and coordination.
3. Intelligent Sales Funnel
The gradient boosting method also allows extraction of the most relevant features predicting hit rate, which provides sales management with interesting insights. Hit rate drivers can be considered valuable from the product portfolio viewpoint too. However, some of these attributes are bound to be rather obvious — if price and margin are not listed among the most important hit rate features, your model may be misspecified.
Gradient boosting is a non-linear modeling procedure, so a direct two-way interpretation for factor contribution does not exist. Usually a combined feature importance metric is used because model attributes have three importance dimensions to them. Firstly, the “gain” metric is a volume-oriented forecast contribution statistic. On the contrary, “frequency” conveys how often features occur as tree splits within the model. Finally, the “cover” metric is for the relative number of opportunities associated with each particular feature. In case you are not satisfied with the combined metric nor individual components, it is also possible to visualize individual trees to truly understand the model dynamics.
SCENARIO-BASED PROFITABILITY MODELLING
The hit rate engine enables “what-if” analysis for individual opportunities, which allows the model to be deployed as a sales assistant. This ML implementation provides salespeople with a statistically justifiable estimate on how hit rate and profitability evolve as opportunity price is increased or decreased. More attributes for a salesperson to tinker with can be included as well.
4. Hit Rate Engine as a Sales Assistant
To fulfil the requirements set by the sales assistant use case, a range of prices has to be fed to the algorithm. The resulting impulse response is a range of hit rates. If the margin is included in the model, it has to be dynamically adjusted as price changes by using a suitable dependency function.
Hit Rate = Model(Price, Margin, Features1-N) + Ɛ
Now that we get the curve of hit rates out by dynamically changing price and margin, it is time to add profitability into the equation. This is fairly straightforward as we already have hit rate and margin calculated for each price level. The formula of expected profit includes probability of winning, corresponding price and margin at this price. This is obviously something that corporations want to maximize.
Expected Profit = Hit Rate * Opportunity Price * Margin
Forecasting accuracy increases along with the number of entered model input features assuming that the underlying statistical model has been produced according to best practices. The behaviour of hit rate and profitability with respect to price can be visualized and embedded as a sales assistant feature in a CRM or CPQ system. Of course, this requires proper usability design to ensure that the complexity of the underlying model is hidden from the salesperson. The model does not have to be trained constantly, so response time is not an issue.