# Price_Optimization related

## Data understanding

Since a small number of features is provided, the first task was to review plots of the features provided.

Main observations:

• All but one of the competitors to our product are seasonal competitors, so their prices should have limited impact on our sales volumes.
• It seems from the data that the Competitor 2 and Competitor 7 are Companies that would ensure not only seasonal supply of their product since the information on their prices is provided for the period which follows the seasonal promotion period. But not hard assumption can be made since this behavior is observed at the end of the supplied time series with prices of this competitors.
• Competitors’ prices are very low in the end of their cycle. Most probably the product has short period of expiration, so the competitors are forced to sell the remaining quantities in the end of the season.
• All promotions lead to boost in the sales. For most promotions the price is raising in the end of the promotion. A continuous effect of the promotions can be observed after the promotion has ended.
• The mean price of our product is higher that the mean price of all of the competitors. Most probably this means that our producer is the leader on the market, and the rest of the competitors have follower’s behaviour.
• A couple of cases are observed where the price is obviously discounted, but this week is not market as a promotion in the promotions calendar. The reason for this may be that the retailers are also organizing promotions, but these cases are not included in the promotions calendar.
• The frequency of the promotions is between 1 and 7 weeks. There are some cases where in the last week of promotion the price is higher than the price in the first week of promotion (discounted price). It may be due to the way data is aggregated – on weekly level, so if the promotion has started on a Friday the data for the sales during this promotion period might not be absolutely correct.
• The strongest effect is observed for observations A, B, D
• The effect of the promotions of the competitors on the sales is estimated as proxy of the cannibalization effect. It is measured as the impact of the price discount of the competitor on the sales.
• For the most promotional cases the effect of the promotions has highest impact in the first week of the campaign and has decreasing in the following weeks.

## Baseline Price Derivation

As a most important step, baseline price has been evaluated for our product and the products of the main competitors. The baseline price is defined from the periods with no promotion for every next period with promotion. Two different approaches have been used for defining the baseline. The first one was based on the mean price calculated for the period before the promotion. This value was used for the period in promotion.

The second more sophisticated and finally used approach included the following steps:

• For the period of promotions, the price before the promotion and the price after the promotion were used for linear interpolation of the base price during promotion periods
• For the periods with no promotions registered in the promotions calendar the real price was used
• In order to smooth the final base price Hodrick Prescott filtering was applied with lambda of 10

## Promotion Discount Derivation

The above baseline price is used in combination with the observed weekly Actual Prices. The Discount is estimated as : (Baseline Price – Actual Price) / Baseline Price. The variable is derived for our company price and the competitor prices

As a next step, additional features have been derived that account for price changes:

• price elasticity
• baseline prices
• price discount
• competitors’ base price
• impact from competitors’ price discount
• price log difference between own and competitors’ price
• number of competitors
• dummies for promotion type and week from start of promotion
• time since last promotion started
• competitor is in promotion or not etc.

Variables based on the maximum realized volume of sales in the last periods before the promotion were derived in order to be captured the relative effect of the current promotion.

For the purposes of the investigation of the cannibalization effect on the market a base level of the Volume of Sales characteristic is created. The Sales are preliminary cleaned form the effect of promotions of the main product in order to be isolated the influence of the discount in the competitors price on the Volume of Sales. For this purpose a model with the dummies for the type of promotion and for the week of the promotions is used. The residuals of this models are smoother with HP filtering (Lambda = 10) and are used as a proxy for the volume of the volume of sales if there were no promotion campaigns.

## Time Series Effects

Auto-correlation function have been applied to the price. As a result of the ARIMA model additional features have been derived for the lagged values for the most significant lags. It is well visible from the applied graph that the first and second lag of the equation are more explainable for the volume of sales.

In addition Spectral analysis were applied to the volume of sales, previously cleaned from the effect of the prices in order to be captured pattern in the frequency of the sales which is independent from the price effect. For this purpose the Fourier frequency analysis was used.

Additional variable to represent two spikes within an year (52 weeks period) was derived to represent the short term increase in sales due to national holidays. Since there was non-national holiday calendar available the weeks for each year were determined through analysis of data. The approach was to divide the data into 52 week segments (incomplete for the last year). A sum for each week of the year for the three years was calculated and dived by the number of years for each data was available for the corresponding week (e.g. 2 or 3). Two spikes in volumes are noticeable through visual analysis of the average sales for each week of the year, and were further confirmed by having similar spikes for the corresponding period for the volumes within each separate year.

## Modeling

### OLS Model

A preferred method for statistical analysis in Marketing is OLS. This is the case, because OLS allows a straightforward interpretation of the sales volume uplift factors. All variables described in section “Feature Engineering” are tested in the model. As the above phase results in around 150 possible explanatory variables a model selection process is required.

The explanatory power of the features and their importance are evaluated by applying a Gradient Boosting Tree algorithm – XGBoost and by building regression models through LASSO analysis. Both analysis rely on initial calibration of the input parameters. This was done by an exhaustive iterative approach, which tries all combinations and selects as optimal the one, which results in highest accuracy. To account for over-fitting, when calibrating the parameters a KFold validation approach was applied, based on 5 sub-samples. The 2 Model Selection approaches described above resulted in a short list of variables with highest importance, which were used as starting point for the OLS model building phase. In the case of LASSO, all variables with non-zero coefficient were added to the short-list, while in the case of XGBoost, the first 20 variables with highest weight in the final prediction were included.

Both forward and backward stepwise selection processes were used when building the final model. The forward selection process was guided by the short-list obtained from the initial phase and relied on expert domain knowledge, statistical evaluation of the significance of the parameters, marginal improvement of the model and analysis on the correlation matrix. The Bakcward selection process started by forcing all variables in the model and evaluating their significance. The second step in the backward-selection process was to remove all insignificant variables one by one, while applying higher thresholds for the variables in the short-list and considering the marginal effect on the accuarcy.

A 5% significance was generally used for evaluating the statistical significance of the coefficients. Only one parameter estimates enters the model with p-value higher then 5% (equal to 7.8%), but as the p-value is still below 10% and the parameter has entered the model with logical sign and level value it was decided to keep the parameter in the model.

Residuals have been tested using Shapiro-Wilk test and they are assessed as normally distributed.

From business point of view it makes sense to have non-static best and worst case scenario intervals. So, for example when the volumes are on a peak then higher variation can be expected. On the other hand, when the sales volumes are low, the intervals between the best and worst case are shrinked, because the product is a market leader and has a stable customer base. In order to incorporate these to cases into the prediction we multiplied the intervals by a penalizing parameter having values between 0 and 1.

The function has been derived in the following steps:

### Best case scenario

• to our predicted volume of sales a correction term is added
• the correction term is a sigmoid function applied to the percentage change in the sales volume since the previous week multiplied by 3 times the standard deviation of the residuals

### Worst case scenario

• from our predicted volume of sales a correction term is substracted
• the correction term is a sigmoid function applied to the percentage change in the sales volume since the previous week multiplied by 3 times the standard deviation of the residuals

From the graph below it can be seen that the best and worst case scenario are affected by sales volumes being in the higher or lower bands.

## Evaluation

The following variables entered the model:

• Baseline Price of Actual Price – This is an estimate of the company product price in situation of no promotion in the long-term. As seen from the model it is expected that and increase in the long-term value of the price results in decrease of volume of sales.
• Discount of company promotions – This estimate shows the percentage discount of the baseline price when in promotion in the current week. It is expected that higher discount value is related to higher sales in the current week.
• Discount from last promotion – This estimate shows the highest discount in the previous promotion. This value is populated with the mean discount value through the development sample for the first few week of the data set. It is expected that a high discount value during the last promotion will result in lower sales in the current as customers often overbuy the product in case of high discounts. As seen in the final model, the coefficient in-front of the current month discount is higher than the one in-front of the last promotion discount. This is indicative that we can control for the previous promotion discount in case of high expectation from the current promotion session.
• Volume of sales from previous week – This variable enters with negative sign in front of its parameter, which shows us that high sales in the previous month are followed by lower sales in the current
• Week of promotion – It is expected that usually highest sales are realised in the first week of the promotion.
• Type A promotion – The model shows that the non-pricing factors of the 5 types of promotion are significant only for promotion A. The current expection is that they result in higher sales.
• Length of last Promotion – This parameter also enters the model with logical coefficient. It is expected that longer previous promotions will result in lower sales in the current period.
• The only competitor, which significantly influence the company sales volumes with their pricing policy is Competitor 3. It is expected that higher percentage difference between the company price and the competitor price results in lower sales for the company.
• Steven added, while there is so many other competitors, we only measure those big competitors and the little competitors its market share is too small to yield an fundmental impact.
• Only 1 competitor in the market – The analysis shows that having only 1 competitor in the market influence positively our company, while the situation of having no competitors is not investigated as there is no such data in the sample
• Holiday/Systematic High Sales – There are 2 periods within the year, which are related with significantly higher volumes of sales. The Statistical analysis shows that this is behavior, which is not explained by any other variables in the model.

The model predictions look quite good. In the graph below you can find the observed vs predicted values:

## Conclusion

Generally elasticity of demand is defined as the percentage change in quantity demanded divided by the percentage change in price. Elasticity for the continuous variables is calculated based on the regression coefficients. It can be used to calculate for each 1% change in the predictors what would be the corresponding change in the volume of sales. The Elasticity is calculated by dividing the mean value for each relevant predictor, multiplied by the regression estimated coefficient and dividing the product the mean value of the dependent variable (Volume of Sales). Or

$\frac{avg(Predictor)*Coefficient}{avg(Volume of Sales)}$ Additionally the regression formula can be used to simulate different results by inputting different values for the dummy variables and analyzing the changes.

The elasticity for the applicable factors contributing to the volume of sales can be seen in the table below:

Variable Elasticity
datafile[‘BASE2_ACTUAL_PRICE_y’] -1.6640498505
datafile[‘DISCOUNT2_ACTUAL_PRICE_y’] 0.2273465563
datafile[‘LAST_DISCOUNT’] -0.3009602194
datafile[‘VOLUME_LAG1’] -0.2775736955
datafile[‘Price_Relative_Diff__3’] 0.1079935245

In other words for each 1% change in the base price we expect the volume of sales to decrease with 1.7%. If there is 1% increase in the promotional discount we expect the volumes of sales to increase with 0.22%. Similarly the effect can be analysed for the other variables.

The analysis showed that the promotional effectiveness is highly price dependent, while the non-pricing factors are significant only in the case of promotion A. The model also shows that the promotion calendar and planning significantly influence the sale volumes as seen by the inclusion of variables like length of last promotion and discount during last promotion.

## Steven Consideration

• for the impact of the same brand price change on the SKUs, a volume price may used to conduct such calculation.
• add the previous weeks price as a new attribute
• add the previous weeks sales as a new attribute
• do the percentage change for the above 2

What is Hodrick-Prescott filtering

why you should never use the Hodrick-Prescott filtering

The Hodrick-Prescott (HP) filter refers to a data-smoothing technique. The HP filter is commonly applied during analysis to remove short-term fluctuations associated with the business cycle. Removal of these short-term fluctuations reveals long-term trends. This can help with economic or other forecasting associated with the business cycle.

KEY TAKEAWAYS

The Hodrick-Prescott filter refers to a data-smoothing technique used primarily in macroeconomics. It is commonly applied during analysis to remove short-term fluctuations associated with the business cycle. In practice, it is used to smooth and detrend the Conference Board’s Help Wanted Index so it can be benchmarked against the Bureau of Labor Statistic’s JOLTS, which measures job vacancies in the U.S.

Understanding the Hodrick-Prescott (HP) Filter

The Hodrick-Prescott (HP) filter is a tool commonly used in macroeconomics. It is named after economists Robert Hodrick and Edward Prescott who first popularized this filter in economics in the 1990s. Hodrick was an economist who specialized in international finance. Prescott won the Nobel Memorial Prize, sharing it with another economist for their research in macroeconomics.

This filter determines the long-term trend of a time series by discounting the importance of short-term price fluctuations. In practice, the filter is used to smooth and detrend the Conference Board’s Help Wanted Index (HWI) so it can be benchmarked against the Bureau of Labor Statistic’s (BLS) JOLTS, an economic data series that may more accurately measure job vacancies in the U.S.

The HP filter is a tool commonly used in macroeconomics.

Special Considerations

The HP filter is one of the most widely used tools in macroeconomic analysis. It tends to have favorable results if the noise is distributed normally, and when the analysis being conducted is historical.

According to a paper published by economist and professor James Hamilton—which appears on the National Bureau of Economic Research website—there are several reasons why the HP filter should not be used. Hamilton first proposes that the filer produces outcomes that have no basis in the process of generating data. He also states that the values that are filtered at the sample’s end are totally different from those in the middle.