Introduction

Time Series Decomposition is a method for separating the overall time series data into multiple components that, when recombined, constitue the original time series. The value of this is that identifying each component often enables us to think about the factors that influence that sub-component directly and, thus, better understand the data and what causes the overall pattern we see. Also, decomposing time series data permits us to forecast how the overall time series will behave into the future based on our understanding of the separate components and what is driving them.

The image below is from the William S. Cleveland book The Elements of Graphing Data, and it is an excellent example of time series decompostion. It identifies and overall trend, a seasonal pattern, and some oscillation that is somewhat regular. There is always some part of the pattern that doesn't follow a regular pattern and we call that the Remainder or, sometimes, Noise.

Cleveland Time Series Decomposition

Example 1: Demand Pattern Decomposition

Experience would suggest that this first example is obviously fictious data because its patterns are significantly more consistent across time than real-world data. Nonetheless, it is a good first exercise, particularly, because the obvious patterns make this first exercise easier than is normally the case.

This data was adapted from the textbook Supply Chain Management: 3rd Edition, by Sunil Chopra and Peter Meindl, Prentice Hall, 2006.

We will be using matplotlib, numpy, and pandas, so let's import those packages now, as well as execute the matplotlib magic command.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Read the data.

In [2]:
df = pd.read_csv('TSDataP1.csv')
df.head()
Out[2]:
Quarter Product1Demand
0 1 152400
1 2 185000
2 3 182880
3 4 161544
4 5 166116

Let's look at the data, first, to begin to understand it.

In [3]:
plt.plot(df.Quarter,df.Product1Demand)
Out[3]:
[<matplotlib.lines.Line2D at 0xadeb748>]

Describe the pattern in this graph? What "component patterns" do you see within the overall graph?

At this point you should observe an overall (positive) trend and seasonality. To perceive the trend, ask what results a linear regression analysis would yield for intercept and slope. Perhaps, you will intuit that the trend in this graph is a linear one rather than a nonlinear one.

You could, and should ask yourself what is causing those patterns. You may know the causes from experience if you are familiar with the context or, alternately, you may need to do research to determine the causes. If you were a consultant, you might well be in the latter position.

Our goal now is to decompose these data. Doing so will enable us to forecast demand into the future because the components are easier to detect and express individually, so that we may extrapolate them into the future. We need to assume a functional form for how the components contribute to the overall pattern in order to guide the math that we do. For our purposes, we will assume this model, which is called an additive model:

  • $D$ = Product Demand
  • $q$ = the index of the quarter
  • $L$ = The 'level' component of the demand pattern which is a constant value
  • $T$ = The (linear) trend of the data. This is the amount that demand increases, on average, from one quarter to the next. T is also a constant.
  • $S \left(q \right)$ = Seasonality component. We need to figure out how many quarters there are before the seaonal pattern repeats so that we can determine the seasonality component for each quarter, $q$.
  • $\epsilon \left( q \right)$ is the portion of demand that we will not be able to fit with $L$, $T \left(q \right)$ and $S \left(q \right)$.

$D\left(q \right) = L + T \left(q \right) + S \left(q \right) + \epsilon \left( q \right)$

We can easily get $L$ and $T$ from a linear regression analysis. We will use a new Python package for this analysis. Many Python packages can do this analysis well.

In [4]:
from scipy import stats

slope, intercept, r_value, p_value, std_err = stats.linregress(df.index,df['Product1Demand'])
print('intercept =', intercept, '    slope =', slope, '     p_value = ',p_value)
intercept = 165969.76470588235     slope = 3567.764705882353      p_value =  0.0004266677252921289

Now we need to 'remove' the trend and the level from the original demand pattern to see what pattern remains for use to describe seasonality. We use the DataFrame.apply() method from pandas to compute the value for each quarter from the regression model, which we will put in a column called regress. then we will subtract that part of the "signal" from the original sales trajectory and put the remainder in a column called R1 for 1st remainder.

Notice the last block of code, which formats the columns of the DataFrame if you haven't seen this technique previously.

In [5]:
def create_regress_col(row, intercept, slope):
    return float(intercept) + float(row['Quarter']) * slope
    
df['regress'] = df.apply(create_regress_col,args = (intercept,slope),axis = "columns")
df['R1'] = df['Product1Demand'] - df['regress']
df.style.format({
    'Product1Demand': '{:,.0f}'.format,
    'regress': '{:,.0f}'.format,
    'R1': '{:,.0f}'.format
})
Out[5]:
Quarter Product1Demand regress R1
0 1 152,400 169,538 -17,138
1 2 185,000 173,105 11,895
2 3 182,880 176,673 6,207
3 4 161,544 180,241 -18,697
4 5 166,116 183,809 -17,693
5 6 202,692 187,376 15,316
6 7 198,120 190,944 7,176
7 8 176,784 194,512 -17,728
8 9 182,880 198,080 -15,200
9 10 216,408 201,647 14,761
10 11 210,312 205,215 5,097
11 12 198,120 208,783 -10,663
12 13 190,500 212,351 -21,851
13 14 228,600 215,918 12,682
14 15 219,456 219,486 -30
15 16 211,836 223,054 -11,218

Here is a plot of the R1 data column.

In [6]:
plt.plot(df.index,df.R1)
Out[6]:
[<matplotlib.lines.Line2D at 0xd0dae10>]

This looks like a repeated pattern.

How many quarters pass before the pattern repeats?

When the autocorrelation value for a particular lag is large and positive (close to 1), it indicates a cyclic pattern with the periodicity of that lag.

In [7]:
# Create column with lag of 4
lag = 4
df['lag4'] = np.NaN
for i in range(len(df['lag4']))[lag:]:
    df['lag4'].iloc[i] = df['Product1Demand'].iloc[i-4]
print(df.head(n=10))

# Compute autocorrelations
for i in range(int(len(df.index)/2)):
    print('autocorrelation, lag =',i,':',df.R1.autocorr(lag = i))
    
fig,ax = plt.subplots()
ax.plot(df.Quarter,df.Product1Demand,c='k')
ax.plot(df.Quarter,df.lag4,c='b')
ax.set_xlim([1,19])
ax.text(16.3,210000,'Demand',color='k')
ax.text(16.3,195000,'Lagged\nDemand',color='b')
ax.set_xlabel('Quarter')
   Quarter  Product1Demand        regress            R1      lag4
0        1          152400  169537.529412 -17137.529412       NaN
1        2          185000  173105.294118  11894.705882       NaN
2        3          182880  176673.058824   6206.941176       NaN
3        4          161544  180240.823529 -18696.823529       NaN
4        5          166116  183808.588235 -17692.588235  152400.0
5        6          202692  187376.352941  15315.647059  185000.0
6        7          198120  190944.117647   7175.882353  182880.0
7        8          176784  194511.882353 -17727.882353  161544.0
8        9          182880  198079.647059 -15199.647059  166116.0
9       10          216408  201647.411765  14760.588235  202692.0
autocorrelation, lag = 0 : 1.0
autocorrelation, lag = 1 : -0.11027853023041476
autocorrelation, lag = 2 : -0.8689094084728757
autocorrelation, lag = 3 : 0.022736719932132606
autocorrelation, lag = 4 : 0.9658506974344423
autocorrelation, lag = 5 : -0.08723019865257427
autocorrelation, lag = 6 : -0.8908995827540015
autocorrelation, lag = 7 : 0.0065624272494473725
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:179: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
Out[7]:
Text(0.5, 0, 'Quarter')

This code plots each sequential series of 4 points, where 4 corresponds with the periodicty of the data. Note how the patterns have similar shapes, which is why the autocorrelation with this lag was nearly 1. Let's create a graph that demonstrates this by plotting each successive group of four points.

In [8]:
dfQtr = pd.DataFrame()
cycleLen = 4
for i in range(int(len(df.index)/cycleLen)):
    newData = pd.DataFrame({i:df['R1'].iloc[i*cycleLen:(i+1)*cycleLen]})
    newData.index = range(0,len(newData))
    dfQtr = pd.concat([dfQtr,newData],axis=1)

fig,ax = plt.subplots()
ax.plot(dfQtr)
Out[8]:
[<matplotlib.lines.Line2D at 0xd0629b0>,
 <matplotlib.lines.Line2D at 0xd0f48d0>,
 <matplotlib.lines.Line2D at 0xd0f4a20>,
 <matplotlib.lines.Line2D at 0xd0f4b70>]

If we average the demand for each of the seasonal quarters, so those averages represent all the curves well?

In [9]:
avg = []
for i in range(len(dfQtr.index)):
    avg.append(dfQtr.iloc[i].mean())

dfQtr = pd.concat([dfQtr,pd.DataFrame({'avg':avg})], axis=1)
print(dfQtr)

fig,ax = plt.subplots()
c = 180
for col in dfQtr.columns.values:
    if col == 'avg':
        ax.plot(dfQtr[col], c = 'r')
    else:
        ax.plot(dfQtr[col], c = 'k')
              0             1             2             3           avg
0 -17137.529412 -17692.588235 -15199.647059 -21850.705882 -17970.117647
1  11894.705882  15315.647059  14760.588235  12681.529412  13663.117647
2   6206.941176   7175.882353   5096.823529    -30.235294   4612.352941
3 -18696.823529 -17727.882353 -10662.941176 -11218.000000 -14576.411765

What does the remainder of the demand pattern look like if we use the average seasonality?

The code below assigns a seasonal component of sales to each quarter based on the average that was computed above and puts it in a column named S.

Next, The 2nd remainder is computed, R2, which is the difference between the original sales trajectory and the sum of the regression model and the seasonal component, $L + T \left(q \right) + S \left(q \right)$. That is, if you look at the oriignal mathematical model, the values in R2 are equal to the term $ \epsilon \left( q \right)$ in the model.

A column named Composite is computed, whcih is the sum of the regression model and seasonal effects, and the percentage error is computed in the columne errorPerc.

In [10]:
df['S'] = np.NaN
df['R2'] = np.NaN
df['Composite'] = np.NaN
df['errorPerc'] = np.NaN
S = dfQtr['avg'].tolist()
for i in df.index:
    df.loc[i,'S'] = S[i%cycleLen]
    df.loc[i,'R2'] = df.loc[i,'R1'] - df.loc[i,'S']
    df.loc[i,'Composite'] = df.loc[i,'regress'] + df.loc[i,'S']
    df.loc[i,'errorPerc'] = 100*df.loc[i,'R2'] / df.loc[i,'Product1Demand']
df.style.format({
    'Product1Demand': '{:,.0f}'.format,
    'regress': '{:,.0f}'.format,
    'R1': '{:,.0f}'.format,
    'S': '{:,.0f}'.format,
    'R2': '{:,.0f}'.format,
    'Composite':'{:,.0f}'.format,
    'errorPerc': '{:.2f}%'.format
})
Out[10]:
Quarter Product1Demand regress R1 lag4 S R2 Composite errorPerc
0 1 152,400 169,538 -17,138 nan -17,970 833 151,567 0.55%
1 2 185,000 173,105 11,895 nan 13,663 -1,768 186,768 -0.96%
2 3 182,880 176,673 6,207 nan 4,612 1,595 181,285 0.87%
3 4 161,544 180,241 -18,697 nan -14,576 -4,120 165,664 -2.55%
4 5 166,116 183,809 -17,693 152400 -17,970 278 165,838 0.17%
5 6 202,692 187,376 15,316 185000 13,663 1,653 201,039 0.82%
6 7 198,120 190,944 7,176 182880 4,612 2,564 195,556 1.29%
7 8 176,784 194,512 -17,728 161544 -14,576 -3,151 179,935 -1.78%
8 9 182,880 198,080 -15,200 166116 -17,970 2,770 180,110 1.51%
9 10 216,408 201,647 14,761 202692 13,663 1,097 215,311 0.51%
10 11 210,312 205,215 5,097 198120 4,612 484 209,828 0.23%
11 12 198,120 208,783 -10,663 176784 -14,576 3,913 194,207 1.98%
12 13 190,500 212,351 -21,851 182880 -17,970 -3,881 194,381 -2.04%
13 14 228,600 215,918 12,682 216408 13,663 -982 229,582 -0.43%
14 15 219,456 219,486 -30 210312 4,612 -4,643 224,099 -2.12%
15 16 211,836 223,054 -11,218 198120 -14,576 3,358 208,478 1.59%

Let's visualize how our model of demand fits actual product demand

In [11]:
fig, ax = plt.subplots()
ax.plot(df['Product1Demand'],c='k')
ax.plot(df['Composite'],c='b')
ax.set_xlim([0,18])
ax.text(15.3,212000,'Demand', color='k')
ax.text(15.3,207000,'Model', color='b')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel('Quarter')
ax.set_ylabel('Demand/Sales')
Out[11]:
Text(0, 0.5, 'Demand/Sales')

Here is a plot of the remainder, that is, the error of our model.

In [12]:
plt.plot(df.Quarter,df.R2)
Out[12]:
[<matplotlib.lines.Line2D at 0xd243e10>]
In [13]:
plt.plot(df.Quarter,df.errorPerc)
Out[13]:
[<matplotlib.lines.Line2D at 0xd29ef28>]

Example 2: United States Monthly Home Sales Time Series Decomposition

This data was downloaded from the Internet (link to be added) and it portrays monthly home sales in the United States on a monthly basis where the data have not been seasonally adjusted as is the case with many data sources.

We will use a similar, but more general model for the Home Sales data. Specifically, we will use this model:

$H\left(m \right) = T\left( m \right) + C\left( m \right) + \epsilon \left( m \right)$

where

  • $H\left( m \right)$ = Number of homes sold in Month $m$
  • $m$ = the index of the month
  • $T\left( m \right)$ = $T$ represents the trend of the data which, in this case, could be nonlinear. $T\left( m \right)$ is the trend value for Month $m$.
  • $C\left( m \right)$ = Cyclicality demand component which plays an analogous role to seasonality, $S\left( q \right)$, except that the pattern repeats after period whose duration might not be one year.
  • $\epsilon \left( m \right)$ is the portion of demand that we will not be able to fit with $T \left(m \right)$ and $C \left( m \right)$.

Often, the ultimate goal of decomposing data like these is to forecast demand into the future because the components are easier to detect and express individually: once identified, each component's pattern can be extrapolated into the future and combined to create a forecast.

In [44]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Load and visualize the data

In [45]:
dfHS = pd.read_csv('homeSales.csv')

fig,ax = plt.subplots()
ax.plot(dfHS['homeSales'],label='Home Sales')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

What type of trend do you see in the data?

The next cell of code computes the moving average of each point for a data window centered on each point. The window size is a variable that can be easily changed, and the average squared error is computed in order to help evaluate which window size is appropriate for the moving average.

In [46]:
def sqErr(row):
    return (row['homeSales'] - row['MovAvg'])**2
    
dfHS['MovAvg'] = np.NaN
dfHS['sqErr'] = np.NaN
# Chaging the DataFrame index to DatetimeIndex data type is required for using one of the functions below
dfHS.index = pd.DatetimeIndex(freq='m', start=pd.Timestamp(year=2007, month=8, day=31), periods = len(dfHS['homeSales']))
#print(len(data),'\n',data)

window = 36
window = window - window % 2
# Compute the moving average in the loop below using a window centered on the data point whose average is eing computed
for i in range(int(window/2),dfHS.shape[0]-int(window/2)):
    dfHS.loc[dfHS.index[i],'MovAvg'] = (0.5*dfHS.iloc[i - int(window/2)]['homeSales'] + dfHS.iloc[i - int(window/2)+1:i + int(window/2)]['homeSales'].sum() + 0.5*dfHS.iloc[i + int(window/2)]['homeSales'])/float(window)

dfHS['sqErr'] = (dfHS['homeSales'] - dfHS['MovAvg'])**2
# The squared error can eb computed also with the dfHA.apply() method below
# Using dfHS.apply() in this case is unecessary complexity, but it is a good function to know about
#dfHS['sqErr'] = dfHS.apply(sqErr,axis='columns')

# The moving average cannot be applied to all rows and we need to delete those rows because we cannot use them in the analysis
dfHS.dropna(how='any',inplace=True)

fig,ax = plt.subplots()
ax.plot(dfHS['MovAvg'],label='Moving Avg.')
ax.plot(dfHS['homeSales'],label='Home Sales')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
print('Average Squared Error per Month: ',sum(dfHS['sqErr'])/len(dfHS))
print(dfHS)
Average Squared Error per Month:  12990.722102901871
           YearMonth  homeSales      MovAvg          sqErr
2009-02-28   2009-02        509  673.916667   27197.506944
2009-03-31   2009-03        782  668.666667   12844.444444
2009-04-30   2009-04        536  663.430556   16238.546489
2009-05-31   2009-05        638  657.611111     384.595679
2009-06-30   2009-06        686  640.375000    2081.640625
2009-07-31   2009-07        896  623.277778   74377.410494
2009-08-31   2009-08        520  617.333333    9473.777778
2009-09-30   2009-09        734  607.416667   16023.340278
2009-10-31   2009-10        673  601.916667    5052.840278
2009-11-30   2009-11        760  598.847222   25970.217785
2009-12-31   2009-12        740  595.638889   20840.130401
2010-01-31   2010-01        471  590.000000   14161.000000
2010-02-28   2010-02        344  583.833333   57520.027778
2010-03-31   2010-03        527  576.916667    2491.673611
2010-04-30   2010-04        556  571.458333     238.960069
2010-05-31   2010-05        565  572.513889      56.458526
2010-06-30   2010-06        551  574.555556     554.864198
2010-07-31   2010-07        818  580.347222   56478.842785
2010-08-31   2010-08        552  583.513889     993.125193
2010-09-30   2010-09        622  582.722222    1542.743827
2010-10-31   2010-10        482  582.750000   10150.562500
2010-11-30   2010-11        415  582.166667   27944.694444
2010-12-31   2010-12        494  582.180556    7775.810378
2011-01-31   2011-01        617  577.291667    1576.751736
2011-02-28   2011-02        704  573.708333   16975.918403
2011-03-31   2011-03        516  574.125000    3378.515625
2011-04-30   2011-04        491  570.444444    6311.419753
2011-05-31   2011-05        466  564.472222    9696.778549
2011-06-30   2011-06        558  558.583333       0.340278
2011-07-31   2011-07        620  555.375000    4176.390625
...              ...        ...         ...            ...
2014-09-30   2014-09        455  517.361111    3888.908179
2014-10-31   2014-10        483  509.916667     724.506944
2014-11-30   2014-11        387  504.000000   13689.000000
2014-12-31   2014-12        572  501.805556    4927.260031
2015-01-31   2015-01        402  499.986111    9601.277971
2015-02-28   2015-02        386  499.111111   12794.123457
2015-03-31   2015-03        869  499.902778  136232.759452
2015-04-30   2015-04        538  501.555556    1328.197531
2015-05-31   2015-05        612  505.125000   11422.265625
2015-06-30   2015-06        655  509.694444   21113.704475
2015-07-31   2015-07        598  511.930556    7407.949267
2015-08-31   2015-08        462  512.500000    2550.250000
2015-09-30   2015-09        506  513.763889      60.277971
2015-10-31   2015-10        451  514.777778    4067.604938
2015-11-30   2015-11        418  514.777778    9365.938272
2015-12-31   2015-12        512  516.763889      22.694637
2016-01-31   2016-01        383  517.680556   18138.852045
2016-02-29   2016-02        336  517.180556   32826.393711
2016-03-31   2016-03        452  517.583333    4301.173611
2016-04-30   2016-04        517  517.000000       0.000000
2016-05-31   2016-05        572  517.000000    3025.000000
2016-06-30   2016-06        577  514.875000    3859.515625
2016-07-31   2016-07        521  511.291667      94.251736
2016-08-31   2016-08        549  510.944444    1448.225309
2016-09-30   2016-09        560  507.402778    2766.467785
2016-10-31   2016-10        541  501.930556    1526.421489
2016-11-30   2016-11        574  498.972222    5629.167438
2016-12-31   2016-12        584  495.708333    7795.418403
2017-01-31   2017-01        398  492.736111    8974.930748
2017-02-28   2017-02        361  488.138889   16164.297068

[97 rows x 4 columns]

The residual Home Sales yet to be explained, $R1$, is computed by subtracting the moving average from the demand time series. Also, these are included in this code cell:

  • Computing $R1$ as a percentage of demand ($R1Error$).
  • The dfHS.style.format command demonstrates how to display pandas DataFrame data in whicever readble format you preer.
In [47]:
dfHS['R1'] = dfHS['homeSales'] - dfHS['MovAvg']
dfHS['R1Error'] = abs((dfHS['homeSales'] - dfHS['R1'])/dfHS['homeSales'])
dfHS.style.format({
    'MovAvg': '{:.1f}'.format,
    'sqErr': '{:,.1f}'.format,
    'R1': '{:,.1f}'.format,
    'R1Error': '{:,.3f}'.format
})
Out[47]:
YearMonth homeSales MovAvg sqErr R1 R1Error
2009-02-28 00:00:00 2009-02 509 673.9 27,197.5 -164.9 1.324
2009-03-31 00:00:00 2009-03 782 668.7 12,844.4 113.3 0.855
2009-04-30 00:00:00 2009-04 536 663.4 16,238.5 -127.4 1.238
2009-05-31 00:00:00 2009-05 638 657.6 384.6 -19.6 1.031
2009-06-30 00:00:00 2009-06 686 640.4 2,081.6 45.6 0.933
2009-07-31 00:00:00 2009-07 896 623.3 74,377.4 272.7 0.696
2009-08-31 00:00:00 2009-08 520 617.3 9,473.8 -97.3 1.187
2009-09-30 00:00:00 2009-09 734 607.4 16,023.3 126.6 0.828
2009-10-31 00:00:00 2009-10 673 601.9 5,052.8 71.1 0.894
2009-11-30 00:00:00 2009-11 760 598.8 25,970.2 161.2 0.788
2009-12-31 00:00:00 2009-12 740 595.6 20,840.1 144.4 0.805
2010-01-31 00:00:00 2010-01 471 590.0 14,161.0 -119.0 1.253
2010-02-28 00:00:00 2010-02 344 583.8 57,520.0 -239.8 1.697
2010-03-31 00:00:00 2010-03 527 576.9 2,491.7 -49.9 1.095
2010-04-30 00:00:00 2010-04 556 571.5 239.0 -15.5 1.028
2010-05-31 00:00:00 2010-05 565 572.5 56.5 -7.5 1.013
2010-06-30 00:00:00 2010-06 551 574.6 554.9 -23.6 1.043
2010-07-31 00:00:00 2010-07 818 580.3 56,478.8 237.7 0.709
2010-08-31 00:00:00 2010-08 552 583.5 993.1 -31.5 1.057
2010-09-30 00:00:00 2010-09 622 582.7 1,542.7 39.3 0.937
2010-10-31 00:00:00 2010-10 482 582.8 10,150.6 -100.8 1.209
2010-11-30 00:00:00 2010-11 415 582.2 27,944.7 -167.2 1.403
2010-12-31 00:00:00 2010-12 494 582.2 7,775.8 -88.2 1.179
2011-01-31 00:00:00 2011-01 617 577.3 1,576.8 39.7 0.936
2011-02-28 00:00:00 2011-02 704 573.7 16,975.9 130.3 0.815
2011-03-31 00:00:00 2011-03 516 574.1 3,378.5 -58.1 1.113
2011-04-30 00:00:00 2011-04 491 570.4 6,311.4 -79.4 1.162
2011-05-31 00:00:00 2011-05 466 564.5 9,696.8 -98.5 1.211
2011-06-30 00:00:00 2011-06 558 558.6 0.3 -0.6 1.001
2011-07-31 00:00:00 2011-07 620 555.4 4,176.4 64.6 0.896
2011-08-31 00:00:00 2011-08 523 555.5 1,055.3 -32.5 1.062
2011-09-30 00:00:00 2011-09 522 557.9 1,289.0 -35.9 1.069
2011-10-31 00:00:00 2011-10 437 563.8 16,069.1 -126.8 1.290
2011-11-30 00:00:00 2011-11 403 569.2 27,634.4 -166.2 1.412
2011-12-31 00:00:00 2011-12 659 571.9 7,588.3 87.1 0.868
2012-01-31 00:00:00 2012-01 663 569.9 8,659.3 93.1 0.860
2012-02-29 00:00:00 2012-02 422 566.7 20,924.4 -144.7 1.343
2012-03-31 00:00:00 2012-03 812 565.0 61,029.6 247.0 0.696
2012-04-30 00:00:00 2012-04 508 563.3 3,054.1 -55.3 1.109
2012-05-31 00:00:00 2012-05 624 562.7 3,755.0 61.3 0.902
2012-06-30 00:00:00 2012-06 701 561.6 19,429.3 139.4 0.801
2012-07-31 00:00:00 2012-07 529 557.6 817.0 -28.6 1.054
2012-08-31 00:00:00 2012-08 629 549.2 6,366.7 79.8 0.873
2012-09-30 00:00:00 2012-09 655 543.5 12,423.0 111.5 0.830
2012-10-31 00:00:00 2012-10 487 543.4 3,176.6 -56.4 1.116
2012-11-30 00:00:00 2012-11 516 546.0 901.7 -30.0 1.058
2012-12-31 00:00:00 2012-12 560 547.6 152.8 12.4 0.978
2013-01-31 00:00:00 2013-01 420 546.5 16,002.2 -126.5 1.301
2013-02-28 00:00:00 2013-02 403 545.7 20,369.6 -142.7 1.354
2013-03-31 00:00:00 2013-03 642 544.8 9,454.9 97.2 0.849
2013-04-30 00:00:00 2013-04 863 544.5 101,459.9 318.5 0.631
2013-05-31 00:00:00 2013-05 652 544.9 11,472.8 107.1 0.836
2013-06-30 00:00:00 2013-06 655 543.5 12,441.5 111.5 0.830
2013-07-31 00:00:00 2013-07 574 538.6 1,251.4 35.4 0.938
2013-08-31 00:00:00 2013-08 559 534.5 600.2 24.5 0.956
2013-09-30 00:00:00 2013-09 493 534.8 1,746.5 -41.8 1.085
2013-10-31 00:00:00 2013-10 489 536.0 2,209.0 -47.0 1.096
2013-11-30 00:00:00 2013-11 369 536.2 27,972.6 -167.2 1.453
2013-12-31 00:00:00 2013-12 460 535.4 5,691.9 -75.4 1.164
2014-01-31 00:00:00 2014-01 361 535.8 30,542.4 -174.8 1.484
2014-02-28 00:00:00 2014-02 357 534.4 31,471.7 -177.4 1.497
2014-03-31 00:00:00 2014-03 455 530.0 5,627.1 -75.0 1.165
2014-04-30 00:00:00 2014-04 539 527.4 133.5 11.6 0.979
2014-05-31 00:00:00 2014-05 610 525.6 7,126.2 84.4 0.862
2014-06-30 00:00:00 2014-06 530 523.6 41.5 6.4 0.988
2014-07-31 00:00:00 2014-07 566 522.4 1,903.1 43.6 0.923
2014-08-31 00:00:00 2014-08 521 520.9 0.0 0.1 1.000
2014-09-30 00:00:00 2014-09 455 517.4 3,888.9 -62.4 1.137
2014-10-31 00:00:00 2014-10 483 509.9 724.5 -26.9 1.056
2014-11-30 00:00:00 2014-11 387 504.0 13,689.0 -117.0 1.302
2014-12-31 00:00:00 2014-12 572 501.8 4,927.3 70.2 0.877
2015-01-31 00:00:00 2015-01 402 500.0 9,601.3 -98.0 1.244
2015-02-28 00:00:00 2015-02 386 499.1 12,794.1 -113.1 1.293
2015-03-31 00:00:00 2015-03 869 499.9 136,232.8 369.1 0.575
2015-04-30 00:00:00 2015-04 538 501.6 1,328.2 36.4 0.932
2015-05-31 00:00:00 2015-05 612 505.1 11,422.3 106.9 0.825
2015-06-30 00:00:00 2015-06 655 509.7 21,113.7 145.3 0.778
2015-07-31 00:00:00 2015-07 598 511.9 7,407.9 86.1 0.856
2015-08-31 00:00:00 2015-08 462 512.5 2,550.2 -50.5 1.109
2015-09-30 00:00:00 2015-09 506 513.8 60.3 -7.8 1.015
2015-10-31 00:00:00 2015-10 451 514.8 4,067.6 -63.8 1.141
2015-11-30 00:00:00 2015-11 418 514.8 9,365.9 -96.8 1.232
2015-12-31 00:00:00 2015-12 512 516.8 22.7 -4.8 1.009
2016-01-31 00:00:00 2016-01 383 517.7 18,138.9 -134.7 1.352
2016-02-29 00:00:00 2016-02 336 517.2 32,826.4 -181.2 1.539
2016-03-31 00:00:00 2016-03 452 517.6 4,301.2 -65.6 1.145
2016-04-30 00:00:00 2016-04 517 517.0 0.0 0.0 1.000
2016-05-31 00:00:00 2016-05 572 517.0 3,025.0 55.0 0.904
2016-06-30 00:00:00 2016-06 577 514.9 3,859.5 62.1 0.892
2016-07-31 00:00:00 2016-07 521 511.3 94.3 9.7 0.981
2016-08-31 00:00:00 2016-08 549 510.9 1,448.2 38.1 0.931
2016-09-30 00:00:00 2016-09 560 507.4 2,766.5 52.6 0.906
2016-10-31 00:00:00 2016-10 541 501.9 1,526.4 39.1 0.928
2016-11-30 00:00:00 2016-11 574 499.0 5,629.2 75.0 0.869
2016-12-31 00:00:00 2016-12 584 495.7 7,795.4 88.3 0.849
2017-01-31 00:00:00 2017-01 398 492.7 8,974.9 -94.7 1.238
2017-02-28 00:00:00 2017-02 361 488.1 16,164.3 -127.1 1.352

The cell below helps us visualize the remaining pattern to be decomposed, $R1$, and it also computes the average residual demand pattern.

In [5]:
fig,ax = plt.subplots()
ax.plot(dfHS['R1'])
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
print('Average Residual: ', sum(dfHS['R1'])/len(dfHS))
Average Residual:  -0.8771477663230256

Just as seasonal demand had a higher autocorrelation when the data were offet by four periods, we need to use autocorrelation analysis to detect whether any cyclical patterns exist and how many periods before they are repeated.

In [48]:
maxCorr = 0.0
period = np.NaN
for i in range(1,37):
    corr = dfHS['R1'].autocorr(lag=i)
    print('Correlation, lag ',i,'   ',corr)
    if corr > maxCorr:
        maxCorr = corr
        period = i
print('period = ',period,'     Maximum Correlation = ',maxCorr)
Correlation, lag  1     0.1767051114808013
Correlation, lag  2     0.07347623986911378
Correlation, lag  3     0.035325401216691525
Correlation, lag  4     -0.14207381877846786
Correlation, lag  5     -0.186183851792978
Correlation, lag  6     -0.08805018522480047
Correlation, lag  7     -0.22918409185474736
Correlation, lag  8     -0.16219610042616106
Correlation, lag  9     -0.05110534430338662
Correlation, lag  10     -0.03134691800038444
Correlation, lag  11     0.004796942013465753
Correlation, lag  12     0.22334286538388148
Correlation, lag  13     0.0832551789022048
Correlation, lag  14     0.03310547818184565
Correlation, lag  15     0.03992377536160384
Correlation, lag  16     -0.12948720982726264
Correlation, lag  17     -0.24928671891988752
Correlation, lag  18     -0.13764333839233686
Correlation, lag  19     -0.14368994733386084
Correlation, lag  20     0.002291087819791824
Correlation, lag  21     0.052991176472111344
Correlation, lag  22     -0.11362212934985394
Correlation, lag  23     0.06634265052356614
Correlation, lag  24     0.17192600687902537
Correlation, lag  25     0.07034566932196513
Correlation, lag  26     0.139336456238816
Correlation, lag  27     0.14582688122963108
Correlation, lag  28     -0.08205113583775704
Correlation, lag  29     -0.1874067855659316
Correlation, lag  30     -0.0139442764850006
Correlation, lag  31     -0.20084061261536068
Correlation, lag  32     -0.05528284064438792
Correlation, lag  33     0.17714466862093914
Correlation, lag  34     -0.00752866744760313
Correlation, lag  35     0.0387637718785238
Correlation, lag  36     0.3920731973776539
period =  36      Maximum Correlation =  0.3920731973776539

The code cell below:

  • Breaks the time series into three components corresonding with each of the three cycles in the data. Note that the third cycle is partial.
  • Computes an average for each of the 36 points within the cycle over all intances of each point in the data
  • Plots the average versus the 3 cycle instances within the data
In [19]:
period = 36
cycleLen = period
numCycles = int(len(dfHS)/cycleLen + 0.5)
cycles = [dfHS.iloc[range(i*period,min((i+1)*period,len(dfHS)))]['R1'] for i in range(numCycles)]
ptsInCycles = [dfHS.iloc[range(i,len(dfHS['R1']),period)]['R1'].tolist() for i in range(period)]
avg = [sum(pts)/len(pts) for pts in ptsInCycles]

fig,ax = plt.subplots()
for i in range(len(cycles)):
    ax.plot(cycles[i].values,label='Cycle '+str(i),c='k')
ax.plot(avg,label='Average Cycle',c='r')
ax.set_xlabel('Month')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.legend()
Out[19]:
<matplotlib.legend.Legend at 0xc378f28>

This code cell:

  • Inserts the appropriate $C\left( m \right)$ value into the $C$ column in the DataFrame for each Month $m$.
  • Plots the cyclicality component $C\left( m \right)$ is plotted with the $R1$ column to see how well the cyclicality component and it match.
In [49]:
cycleLen = period   # see prior cell for computation of cyclicality period
numCycles = int(len(dfHS)/cycleLen + 0.5)
dfHS['C'] = np.NaN   # Creates an empty column for the cyclicality component data
for i in range(len(dfHS)):
    dfHS.loc[dfHS.index[i], 'C'] = avg[i % cycleLen] # Write appropriate cyclicality value

fig,ax = plt.subplots()
ax.plot(dfHS['C'],label='Cyclic Pattern')
ax.plot(dfHS['R1'],label='Remainder After Trend')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
Out[49]:
<matplotlib.legend.Legend at 0xe7997b8>

The code cell below does these:

  • Computes the remaining residual home sales to be explained, $R2$, after subtracting the cyclical component, $C \left( m \right)$: this is the $\epsilon \left( m \right)$ in our model.
  • Computes the error, $R2$, as a percentage of the demand time series.
  • Computes the mathematical model 'fit', composed of the trend and cyclical components: $T\left( m \right) + C\left( m \right)$.
  • Plots the fit of the model $T\left( m \right) + C\left( m \right)$ with the original data, $H \left( m \right)$.
  • Computes the average absolute error of $R2Error$ of the original demand time series.
  • Removes the sqErr column because it is no longer needed
In [50]:
dfHS['R2'] = dfHS['R1'] - dfHS['C']
dfHS['R2Error'] = abs(dfHS['R2']/dfHS['homeSales'])
dfHS['fit'] = dfHS['MovAvg'] + dfHS['C']
dfHS.drop(['sqErr'],axis=1,inplace=True)
print('Average Error: ', sum(dfHS['R2Error'])/len(dfHS))
print(dfHS)
fig,ax = plt.subplots()
ax.plot(dfHS['homeSales'],label='Home Sales')
ax.plot(dfHS['fit'], label = 'Fit')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
Average Error:  0.10588656173826713
           YearMonth  homeSales      MovAvg          R1   R1Error           C  \
2009-02-28   2009-02        509  673.916667 -164.916667  1.324001 -140.893519   
2009-03-31   2009-03        782  668.666667  113.333333  0.855072  243.157407   
2009-04-30   2009-04        536  663.430556 -127.430556  1.237744  -48.750000   
2009-05-31   2009-05        638  657.611111  -19.611111  1.030738   49.513889   
2009-06-30   2009-06        686  640.375000   45.625000  0.933491  110.106481   
2009-07-31   2009-07        896  623.277778  272.722222  0.695623  110.069444   
2009-08-31   2009-08        520  617.333333  -97.333333  1.187179  -22.680556   
2009-09-30   2009-09        734  607.416667  126.583333  0.827543   76.759259   
2009-10-31   2009-10        673  601.916667   71.083333  0.894378  -16.351852   
2009-11-30   2009-11        760  598.847222  161.152778  0.787957   11.449074   
2009-12-31   2009-12        740  595.638889  144.361111  0.804917   50.652778   
2010-01-31   2010-01        471  590.000000 -119.000000  1.252654 -126.726852   
2010-02-28   2010-02        344  583.833333 -239.833333  1.697190 -187.912037   
2010-03-31   2010-03        527  576.916667  -49.916667  1.094719   -6.087963   
2010-04-30   2010-04        556  571.458333  -15.458333  1.027803  101.023148   
2010-05-31   2010-05        565  572.513889   -7.513889  1.013299   51.532407   
2010-06-30   2010-06        551  574.555556  -23.555556  1.042751   50.037037   
2010-07-31   2010-07        818  580.347222  237.652778  0.709471   94.245370   
2010-08-31   2010-08        552  583.513889  -31.513889  1.057090   10.347222   
2010-09-30   2010-09        622  582.722222   39.277778  0.936852   16.694444   
2010-10-31   2010-10        482  582.750000 -100.750000  1.209025  -36.226852   
2010-11-30   2010-11        415  582.166667 -167.166667  1.402811  -86.462963   
2010-12-31   2010-12        494  582.180556  -88.180556  1.178503  -25.111111   
2011-01-31   2011-01        617  577.291667   39.708333  0.935643  -76.597222   
2011-02-28   2011-02        704  573.708333  130.291667  0.814927  -58.083333   
2011-03-31   2011-03        516  574.125000  -58.125000  1.112645  -66.569444   
2011-04-30   2011-04        491  570.444444  -79.444444  1.161801  -33.944444   
2011-05-31   2011-05        466  564.472222  -98.472222  1.211314   -7.027778   
2011-06-30   2011-06        558  558.583333   -0.583333  1.001045    2.930556   
2011-07-31   2011-07        620  555.375000   64.625000  0.895766   54.125000   
...              ...        ...         ...         ...       ...         ...   
2014-09-30   2014-09        455  517.361111  -62.361111  1.137057  -49.131944   
2014-10-31   2014-10        483  509.916667  -26.916667  1.055728  -76.840278   
2014-11-30   2014-11        387  504.000000 -117.000000  1.302326 -141.618056   
2014-12-31   2014-12        572  501.805556   70.194444  0.877282   78.652778   
2015-01-31   2015-01        402  499.986111  -97.986111  1.243747   -2.465278   
2015-02-28   2015-02        386  499.111111 -113.111111  1.293034 -140.893519   
2015-03-31   2015-03        869  499.902778  369.097222  0.575262  243.157407   
2015-04-30   2015-04        538  501.555556   36.444444  0.932259  -48.750000   
2015-05-31   2015-05        612  505.125000  106.875000  0.825368   49.513889   
2015-06-30   2015-06        655  509.694444  145.305556  0.778159  110.106481   
2015-07-31   2015-07        598  511.930556   86.069444  0.856071  110.069444   
2015-08-31   2015-08        462  512.500000  -50.500000  1.109307  -22.680556   
2015-09-30   2015-09        506  513.763889   -7.763889  1.015344   76.759259   
2015-10-31   2015-10        451  514.777778  -63.777778  1.141414  -16.351852   
2015-11-30   2015-11        418  514.777778  -96.777778  1.231526   11.449074   
2015-12-31   2015-12        512  516.763889   -4.763889  1.009304   50.652778   
2016-01-31   2016-01        383  517.680556 -134.680556  1.351646 -126.726852   
2016-02-29   2016-02        336  517.180556 -181.180556  1.539228 -187.912037   
2016-03-31   2016-03        452  517.583333  -65.583333  1.145096   -6.087963   
2016-04-30   2016-04        517  517.000000    0.000000  1.000000  101.023148   
2016-05-31   2016-05        572  517.000000   55.000000  0.903846   51.532407   
2016-06-30   2016-06        577  514.875000   62.125000  0.892331   50.037037   
2016-07-31   2016-07        521  511.291667    9.708333  0.981366   94.245370   
2016-08-31   2016-08        549  510.944444   38.055556  0.930682   10.347222   
2016-09-30   2016-09        560  507.402778   52.597222  0.906076   16.694444   
2016-10-31   2016-10        541  501.930556   39.069444  0.927783  -36.226852   
2016-11-30   2016-11        574  498.972222   75.027778  0.869290  -86.462963   
2016-12-31   2016-12        584  495.708333   88.291667  0.848816  -25.111111   
2017-01-31   2017-01        398  492.736111  -94.736111  1.238030  -76.597222   
2017-02-28   2017-02        361  488.138889 -127.138889  1.352185  -58.083333   

                    R2   R2Error         fit  
2009-02-28  -24.023148  0.047197  533.023148  
2009-03-31 -129.824074  0.166015  911.824074  
2009-04-30  -78.680556  0.146792  614.680556  
2009-05-31  -69.125000  0.108346  707.125000  
2009-06-30  -64.481481  0.093996  750.481481  
2009-07-31  162.652778  0.181532  733.347222  
2009-08-31  -74.652778  0.143563  594.652778  
2009-09-30   49.824074  0.067880  684.175926  
2009-10-31   87.435185  0.129919  585.564815  
2009-11-30  149.703704  0.196979  610.296296  
2009-12-31   93.708333  0.126633  646.291667  
2010-01-31    7.726852  0.016405  463.273148  
2010-02-28  -51.921296  0.150934  395.921296  
2010-03-31  -43.828704  0.083166  570.828704  
2010-04-30 -116.481481  0.209499  672.481481  
2010-05-31  -59.046296  0.104507  624.046296  
2010-06-30  -73.592593  0.133562  624.592593  
2010-07-31  143.407407  0.175315  674.592593  
2010-08-31  -41.861111  0.075835  593.861111  
2010-09-30   22.583333  0.036308  599.416667  
2010-10-31  -64.523148  0.133865  546.523148  
2010-11-30  -80.703704  0.194467  495.703704  
2010-12-31  -63.069444  0.127671  557.069444  
2011-01-31  116.305556  0.188502  500.694444  
2011-02-28  188.375000  0.267578  515.625000  
2011-03-31    8.444444  0.016365  507.555556  
2011-04-30  -45.500000  0.092668  536.500000  
2011-05-31  -91.444444  0.196233  557.444444  
2011-06-30   -3.513889  0.006297  561.513889  
2011-07-31   10.500000  0.016935  609.500000  
...                ...       ...         ...  
2014-09-30  -13.229167  0.029075  468.229167  
2014-10-31   49.923611  0.103362  433.076389  
2014-11-30   24.618056  0.063613  362.381944  
2014-12-31   -8.458333  0.014787  580.458333  
2015-01-31  -95.520833  0.237614  497.520833  
2015-02-28   27.782407  0.071975  358.217593  
2015-03-31  125.939815  0.144925  743.060185  
2015-04-30   85.194444  0.158354  452.805556  
2015-05-31   57.361111  0.093727  554.638889  
2015-06-30   35.199074  0.053739  619.800926  
2015-07-31  -24.000000  0.040134  622.000000  
2015-08-31  -27.819444  0.060215  489.819444  
2015-09-30  -84.523148  0.167042  590.523148  
2015-10-31  -47.425926  0.105157  498.425926  
2015-11-30 -108.226852  0.258916  526.226852  
2015-12-31  -55.416667  0.108236  567.416667  
2016-01-31   -7.953704  0.020767  390.953704  
2016-02-29    6.731481  0.020034  329.268519  
2016-03-31  -59.495370  0.131627  511.495370  
2016-04-30 -101.023148  0.195403  618.023148  
2016-05-31    3.467593  0.006062  568.532407  
2016-06-30   12.087963  0.020950  564.912037  
2016-07-31  -84.537037  0.162259  605.537037  
2016-08-31   27.708333  0.050471  521.291667  
2016-09-30   35.902778  0.064112  524.097222  
2016-10-31   75.296296  0.139180  465.703704  
2016-11-30  161.490741  0.281343  412.509259  
2016-12-31  113.402778  0.194183  470.597222  
2017-01-31  -18.138889  0.045575  416.138889  
2017-02-28  -69.055556  0.191290  430.055556  

[97 rows x 9 columns]
Out[50]:
<matplotlib.legend.Legend at 0xdd93e80>

Here is a plot of the residual $R2$ for visualization purposes to observe any remaining patterns that we might want to capture, and also an autocorrelation analysis of the residual.

In [23]:
fig,ax = plt.subplots()
ax.plot(dfHS['R2'],label='Remainder after Trend and Cyclical Components')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
maxCorr = 0.0
period = np.NaN
for i in range(1,37):
    corr = dfHS['R2'].autocorr(lag=i)
    print('Correlation, lag ',i,'   ',corr)
    if corr > maxCorr:
        maxCorr = corr
        period = i
print('period = ',period,'     Maximum Correlation = ',maxCorr)
Correlation, lag  1     0.3407551921909781
Correlation, lag  2     0.13100301233823802
Correlation, lag  3     -0.1766559507914226
Correlation, lag  4     -0.1003590415909726
Correlation, lag  5     -0.16814690922501
Correlation, lag  6     -0.12415781580954244
Correlation, lag  7     -0.05822002914434892
Correlation, lag  8     -0.13054080263986692
Correlation, lag  9     -0.23781484496066232
Correlation, lag  10     -0.1222172789940625
Correlation, lag  11     -0.08231661390561465
Correlation, lag  12     0.06261644536400905
Correlation, lag  13     -0.13766331183104474
Correlation, lag  14     0.10809732254492026
Correlation, lag  15     0.011247451157934421
Correlation, lag  16     0.04436394548925096
Correlation, lag  17     -0.14415108233385596
Correlation, lag  18     0.052789137547610075
Correlation, lag  19     0.15053368772446965
Correlation, lag  20     -0.0466652928229702
Correlation, lag  21     -0.09679940427971702
Correlation, lag  22     -0.1701938623693834
Correlation, lag  23     0.020169334349892994
Correlation, lag  24     0.09031409175740474
Correlation, lag  25     0.27231073903654424
Correlation, lag  26     0.35741781057919386
Correlation, lag  27     0.13223143762177836
Correlation, lag  28     -0.012056882477423654
Correlation, lag  29     -0.128772771728849
Correlation, lag  30     0.03298909388405914
Correlation, lag  31     -0.0659956832769614
Correlation, lag  32     -0.09649655466544935
Correlation, lag  33     0.010039231706154257
Correlation, lag  34     -0.15959627582948144
Correlation, lag  35     -0.20582225146024993
Correlation, lag  36     -0.5918842979730136
period =  26      Maximum Correlation =  0.35741781057919386

A final graph to show the model versus the original data and, as well, the remander $R2$ to judge it relative to the original demand we were trying to fit.

In [24]:
fig,ax = plt.subplots()
ax.plot(dfHS['homeSales'],label='Home Sales')
ax.plot(dfHS['fit'],label='Fit')
ax.plot(dfHS['R2'],label='Residual')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
Out[24]:
<matplotlib.legend.Legend at 0xd0839e8>

Trying different factors of cyclicality, it looks as though teh simple average is the best value:

In [28]:
for a in [0.1 * i for i in range(1,20)]:
    dfHS['aC'] = a*dfHS['C']
    dfHS['R3'] = dfHS['R1'] - dfHS['aC']
    dfHS['sqErr'] = dfHS['R3']**2
    print('Squared Error for a =','{:.1f}'.format(a),':',sum(dfHS['sqErr']))
    #print('Average Error: ', sum(dfHS['R2Error'])/len(dfHS))
Squared Error for a = 0.1 : 1122702.832199074
Squared Error for a = 0.2 : 999768.4848148153
Squared Error for a = 0.3 : 891297.0018287036
Squared Error for a = 0.4 : 797288.3832407407
Squared Error for a = 0.5 : 717742.6290509261
Squared Error for a = 0.6 : 652659.7392592594
Squared Error for a = 0.7 : 602039.7138657409
Squared Error for a = 0.8 : 565882.5528703708
Squared Error for a = 0.9 : 544188.2562731483
Squared Error for a = 1.0 : 536956.8240740743
Squared Error for a = 1.1 : 544188.2562731481
Squared Error for a = 1.2 : 565882.5528703702
Squared Error for a = 1.3 : 602039.7138657405
Squared Error for a = 1.4 : 652659.7392592593
Squared Error for a = 1.5 : 717742.6290509258
Squared Error for a = 1.6 : 797288.3832407407
Squared Error for a = 1.7 : 891297.001828704
Squared Error for a = 1.8 : 999768.4848148145
Squared Error for a = 1.9 : 1122702.8321990743

Another Approach to Monthly Home Sales Time Series Decomposition

The prior approach to decomposing monthly home sales works well, but it is not amenable to a forecast of future sales because each point is computed with data from the future. The approach used in this section differs only by how the trend cmoponent is computed, whcih is a moving average of some number of periods prior to the perio od sales we are trying to forecast. That is, for each point it uses only data from the past relative to that point.

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [27]:
def sqErr(row):
    return (row[1] - row[2])**2
    
dfHSa = pd.read_csv('homeSales.csv')
dfHSa['MovAvg'] = np.NaN
dfHSa['sqErr'] = np.NaN
dfHSa.index = pd.DatetimeIndex(freq='m', start=pd.Timestamp(year=2007, month=8, day=31), periods = len(dfHSa['homeSales']))
#print(len(data),'\n',data)

window = 30
for i in range(window+1,len(dfHSa)):
    dfHSa.loc[dfHSa.index[i],'MovAvg'] = sum(dfHSa.iloc[range(i-window-1,i)]['homeSales'])/float(window)
dfHSa['sqErr'] = dfHSa.apply(sqErr,axis='columns')
#print(data.head())
dfHSa.dropna(how='any',inplace=True)

fig,ax = plt.subplots()
ax.plot(dfHSa['MovAvg'], label='Moving Avg.')
ax.plot(dfHSa['homeSales'], label='Home Sales')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
print('Average Squared Error per Month: ',sum(dfHSa['sqErr'])/len(dfHSa))
print(dfHSa)
Average Squared Error per Month:  14189.294793028315
           YearMonth  homeSales      MovAvg         sqErr
2010-03-31   2010-03        527  713.566667  34807.121111
2010-04-30   2010-04        556  701.866667  21277.084444
2010-05-31   2010-05        565  697.933333  17671.271111
2010-06-30   2010-06        551  689.866667  19283.951111
2010-07-31   2010-07        818  691.266667  16061.337778
2010-08-31   2010-08        552  663.833333  12506.694444
2010-09-30   2010-09        622  658.866667   1359.151111
2010-10-31   2010-10        482  644.666667  26460.444444
2010-11-30   2010-11        415  631.200000  46742.440000
2010-12-31   2010-12        494  627.800000  17902.440000
2011-01-31   2011-01        617  622.233333     27.387778
2011-02-28   2011-02        704  623.000000   6561.000000
2011-03-31   2011-03        516  613.466667   9499.751111
2011-04-30   2011-04        491  610.766667  14344.054444
2011-05-31   2011-05        466  595.600000  16796.160000
2011-06-30   2011-06        558  597.600000   1568.160000
2011-07-31   2011-07        620  604.266667    247.537778
2011-08-31   2011-08        523  606.366667   6950.001111
2011-09-30   2011-09        522  612.200000   8136.040000
2011-10-31   2011-10        437  612.633333  30847.067778
2011-11-30   2011-11        403  601.133333  39256.817778
2011-12-31   2011-12        659  596.700000   3881.290000
2012-01-31   2012-01        663  597.400000   4303.360000
2012-02-29   2012-02        422  596.633333  30496.801111
2012-03-31   2012-03        812  580.833333  53438.027778
2012-04-30   2012-04        508  590.566667   6817.254444
2012-05-31   2012-05        624  583.033333   1678.267778
2012-06-30   2012-06        701  581.400000  14304.160000
2012-07-31   2012-07        529  579.433333   2543.521111
2012-08-31   2012-08        629  572.400000   3203.560000
...              ...        ...         ...           ...
2016-03-31   2016-03        452  511.133333   3496.751111
2016-04-30   2016-04        517  507.566667     88.987778
2016-05-31   2016-05        572  508.366667   4049.201111
2016-06-30   2016-06        577  511.133333   4338.417778
2016-07-31   2016-07        521  518.066667      8.604444
2016-08-31   2016-08        549  520.100000    835.210000
2016-09-30   2016-09        560  526.366667   1131.201111
2016-10-31   2016-10        541  533.133333     61.884444
2016-11-30   2016-11        574  536.000000   1444.000000
2016-12-31   2016-12        584  537.166667   2193.361111
2017-01-31   2017-01        398  536.300000  19126.890000
2017-02-28   2017-02        361  531.900000  29206.810000
2017-03-31   2017-03        542  525.066667    286.737778
2017-04-30   2017-04        525  525.766667      0.587778
2017-05-31   2017-05        624  528.100000   9196.810000
2017-06-30   2017-06        659  532.800000  15926.440000
2017-07-31   2017-07        503  541.866667   1510.617778
2017-08-31   2017-08        548  539.566667     71.121111
2017-09-30   2017-09        457  544.433333   7644.587778
2017-10-31   2017-10        439  546.800000  11620.840000
2017-11-30   2017-11        431  532.466667  10295.484444
2017-12-31   2017-12        375  528.900000  23685.210000
2018-01-31   2018-01        341  521.000000  32400.000000
2018-02-28   2018-02        422  510.533333   7838.151111
2018-03-31   2018-03        578  504.666667   5377.777778
2018-04-30   2018-04        435  508.533333   5407.151111
2018-05-31   2018-05        502  506.166667     17.361111
2018-06-30   2018-06        530  507.866667    489.884444
2018-07-31   2018-07        509  511.600000      6.760000
2018-08-31   2018-08        220  511.500000  84972.250000

[102 rows x 4 columns]
In [28]:
dfHSa['R1'] = dfHSa['homeSales'] - dfHSa['MovAvg']
dfHSa['R1Error'] = abs((dfHSa['homeSales'] - dfHSa['R1'])/dfHSa['homeSales'])
dfHSa.style.format({
    'MovAvg': '{:.1f}'.format,
    'sqErr': '{:,.1f}'.format    
})
Out[28]:
YearMonth homeSales MovAvg sqErr R1 R1Error
2010-03-31 00:00:00 2010-03 527 713.6 34,807.1 -186.567 1.35402
2010-04-30 00:00:00 2010-04 556 701.9 21,277.1 -145.867 1.26235
2010-05-31 00:00:00 2010-05 565 697.9 17,671.3 -132.933 1.23528
2010-06-30 00:00:00 2010-06 551 689.9 19,284.0 -138.867 1.25203
2010-07-31 00:00:00 2010-07 818 691.3 16,061.3 126.733 0.845069
2010-08-31 00:00:00 2010-08 552 663.8 12,506.7 -111.833 1.2026
2010-09-30 00:00:00 2010-09 622 658.9 1,359.2 -36.8667 1.05927
2010-10-31 00:00:00 2010-10 482 644.7 26,460.4 -162.667 1.33748
2010-11-30 00:00:00 2010-11 415 631.2 46,742.4 -216.2 1.52096
2010-12-31 00:00:00 2010-12 494 627.8 17,902.4 -133.8 1.27085
2011-01-31 00:00:00 2011-01 617 622.2 27.4 -5.23333 1.00848
2011-02-28 00:00:00 2011-02 704 623.0 6,561.0 81 0.884943
2011-03-31 00:00:00 2011-03 516 613.5 9,499.8 -97.4667 1.18889
2011-04-30 00:00:00 2011-04 491 610.8 14,344.1 -119.767 1.24392
2011-05-31 00:00:00 2011-05 466 595.6 16,796.2 -129.6 1.27811
2011-06-30 00:00:00 2011-06 558 597.6 1,568.2 -39.6 1.07097
2011-07-31 00:00:00 2011-07 620 604.3 247.5 15.7333 0.974624
2011-08-31 00:00:00 2011-08 523 606.4 6,950.0 -83.3667 1.1594
2011-09-30 00:00:00 2011-09 522 612.2 8,136.0 -90.2 1.1728
2011-10-31 00:00:00 2011-10 437 612.6 30,847.1 -175.633 1.40191
2011-11-30 00:00:00 2011-11 403 601.1 39,256.8 -198.133 1.49165
2011-12-31 00:00:00 2011-12 659 596.7 3,881.3 62.3 0.905463
2012-01-31 00:00:00 2012-01 663 597.4 4,303.4 65.6 0.901056
2012-02-29 00:00:00 2012-02 422 596.6 30,496.8 -174.633 1.41382
2012-03-31 00:00:00 2012-03 812 580.8 53,438.0 231.167 0.715312
2012-04-30 00:00:00 2012-04 508 590.6 6,817.3 -82.5667 1.16253
2012-05-31 00:00:00 2012-05 624 583.0 1,678.3 40.9667 0.934348
2012-06-30 00:00:00 2012-06 701 581.4 14,304.2 119.6 0.829387
2012-07-31 00:00:00 2012-07 529 579.4 2,543.5 -50.4333 1.09534
2012-08-31 00:00:00 2012-08 629 572.4 3,203.6 56.6 0.910016
2012-09-30 00:00:00 2012-09 655 577.7 5,980.4 77.3333 0.881934
2012-10-31 00:00:00 2012-10 487 588.0 10,207.7 -101.033 1.20746
2012-11-30 00:00:00 2012-11 516 586.7 4,998.5 -70.7 1.13702
2012-12-31 00:00:00 2012-12 560 585.4 643.5 -25.3667 1.0453
2013-01-31 00:00:00 2013-01 420 585.2 27,291.0 -165.2 1.39333
2013-02-28 00:00:00 2013-02 403 580.8 31,624.7 -177.833 1.44127
2013-03-31 00:00:00 2013-03 642 567.0 5,625.0 75 0.883178
2013-04-30 00:00:00 2013-04 863 570.0 85,849.0 293 0.660487
2013-05-31 00:00:00 2013-05 652 578.0 5,471.1 73.9667 0.886554
2013-06-30 00:00:00 2013-06 655 583.7 5,083.7 71.3 0.891145
2013-07-31 00:00:00 2013-07 574 591.7 313.3 -17.7 1.03084
2013-08-31 00:00:00 2013-08 559 594.4 1,250.8 -35.3667 1.06327
2013-09-30 00:00:00 2013-09 493 592.4 9,887.0 -99.4333 1.20169
2013-10-31 00:00:00 2013-10 489 585.4 9,293.0 -96.4 1.19714
2013-11-30 00:00:00 2013-11 369 584.5 46,440.2 -215.5 1.58401
2013-12-31 00:00:00 2013-12 460 580.4 14,504.2 -120.433 1.26181
2014-01-31 00:00:00 2014-01 361 580.2 48,063.3 -219.233 1.60729
2014-02-28 00:00:00 2014-02 357 573.7 46,944.4 -216.667 1.60691
2014-03-31 00:00:00 2014-03 455 564.9 12,078.0 -109.9 1.24154
2014-04-30 00:00:00 2014-04 539 562.6 558.5 -23.6333 1.04385
2014-05-31 00:00:00 2014-05 610 563.2 2,190.2 46.8 0.923279
2014-06-30 00:00:00 2014-06 530 569.0 1,518.4 -38.9667 1.07352
2014-07-31 00:00:00 2014-07 566 573.2 51.8 -7.2 1.01272
2014-08-31 00:00:00 2014-08 521 570.1 2,410.8 -49.1 1.09424
2014-09-30 00:00:00 2014-09 455 565.4 12,180.8 -110.367 1.24256
2014-10-31 00:00:00 2014-10 483 566.5 6,966.7 -83.4667 1.17281
2014-11-30 00:00:00 2014-11 387 555.5 28,392.2 -168.5 1.4354
2014-12-31 00:00:00 2014-12 572 551.5 421.6 20.5333 0.964103
2015-01-31 00:00:00 2015-01 402 549.7 21,825.1 -147.733 1.3675
2015-02-28 00:00:00 2015-02 386 539.8 23,644.2 -153.767 1.39836
2015-03-31 00:00:00 2015-03 869 535.0 111,556.0 334 0.61565
2015-04-30 00:00:00 2015-04 538 543.0 25.0 -5 1.00929
2015-05-31 00:00:00 2015-05 612 539.1 5,314.4 72.9 0.880882
2015-06-30 00:00:00 2015-06 655 543.3 12,484.3 111.733 0.829415
2015-07-31 00:00:00 2015-07 598 547.9 2,510.0 50.1 0.916221
2015-08-31 00:00:00 2015-08 462 549.2 7,598.0 -87.1667 1.18867
2015-09-30 00:00:00 2015-09 506 550.6 1,986.2 -44.5667 1.08808
2015-10-31 00:00:00 2015-10 451 554.0 10,609.0 -103 1.22838
2015-11-30 00:00:00 2015-11 418 547.6 16,804.8 -129.633 1.31013
2015-12-31 00:00:00 2015-12 512 532.8 432.6 -20.8 1.04062
2016-01-31 00:00:00 2016-01 383 528.1 21,063.7 -145.133 1.37894
2016-02-29 00:00:00 2016-02 336 519.1 33,513.4 -183.067 1.54484
2016-03-31 00:00:00 2016-03 452 511.1 3,496.8 -59.1333 1.13083
2016-04-30 00:00:00 2016-04 517 507.6 89.0 9.43333 0.981754
2016-05-31 00:00:00 2016-05 572 508.4 4,049.2 63.6333 0.888753
2016-06-30 00:00:00 2016-06 577 511.1 4,338.4 65.8667 0.885846
2016-07-31 00:00:00 2016-07 521 518.1 8.6 2.93333 0.99437
2016-08-31 00:00:00 2016-08 549 520.1 835.2 28.9 0.947359
2016-09-30 00:00:00 2016-09 560 526.4 1,131.2 33.6333 0.93994
2016-10-31 00:00:00 2016-10 541 533.1 61.9 7.86667 0.985459
2016-11-30 00:00:00 2016-11 574 536.0 1,444.0 38 0.933798
2016-12-31 00:00:00 2016-12 584 537.2 2,193.4 46.8333 0.919806
2017-01-31 00:00:00 2017-01 398 536.3 19,126.9 -138.3 1.34749
2017-02-28 00:00:00 2017-02 361 531.9 29,206.8 -170.9 1.47341
2017-03-31 00:00:00 2017-03 542 525.1 286.7 16.9333 0.968758
2017-04-30 00:00:00 2017-04 525 525.8 0.6 -0.766667 1.00146
2017-05-31 00:00:00 2017-05 624 528.1 9,196.8 95.9 0.846314
2017-06-30 00:00:00 2017-06 659 532.8 15,926.4 126.2 0.808498
2017-07-31 00:00:00 2017-07 503 541.9 1,510.6 -38.8667 1.07727
2017-08-31 00:00:00 2017-08 548 539.6 71.1 8.43333 0.984611
2017-09-30 00:00:00 2017-09 457 544.4 7,644.6 -87.4333 1.19132
2017-10-31 00:00:00 2017-10 439 546.8 11,620.8 -107.8 1.24556
2017-11-30 00:00:00 2017-11 431 532.5 10,295.5 -101.467 1.23542
2017-12-31 00:00:00 2017-12 375 528.9 23,685.2 -153.9 1.4104
2018-01-31 00:00:00 2018-01 341 521.0 32,400.0 -180 1.52786
2018-02-28 00:00:00 2018-02 422 510.5 7,838.2 -88.5333 1.20979
2018-03-31 00:00:00 2018-03 578 504.7 5,377.8 73.3333 0.873126
2018-04-30 00:00:00 2018-04 435 508.5 5,407.2 -73.5333 1.16904
2018-05-31 00:00:00 2018-05 502 506.2 17.4 -4.16667 1.0083
2018-06-30 00:00:00 2018-06 530 507.9 489.9 22.1333 0.958239
2018-07-31 00:00:00 2018-07 509 511.6 6.8 -2.6 1.00511
2018-08-31 00:00:00 2018-08 220 511.5 84,972.2 -291.5 2.325
In [29]:
fig,ax = plt.subplots()
ax.plot(dfHSa['R1'],label='Remainder after Trend')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
print('Average Residual: ', sum(dfHSa['R1'])/len(dfHSa))

maxCorr = 0.0
period = np.NaN
for i in range(1,37):
    corr = dfHSa['R1'].autocorr(lag=i)
    print('Correlation, lag ',i,'   ',corr)
    if corr > maxCorr:
        maxCorr = corr
        period = i
print('period = ',period,'     Maximum Correlation = ',maxCorr)
Average Residual:  -44.28039215686276
Correlation, lag  1     0.26139365427306627
Correlation, lag  2     0.10498657768111407
Correlation, lag  3     0.10383505938041838
Correlation, lag  4     -0.1886295905008609
Correlation, lag  5     -0.14519435719226084
Correlation, lag  6     -0.0719196497635646
Correlation, lag  7     -0.1331146828176694
Correlation, lag  8     -0.13965228952253989
Correlation, lag  9     -0.041513821399192796
Correlation, lag  10     0.0026520514131592745
Correlation, lag  11     0.1643211127713046
Correlation, lag  12     0.29837285028569405
Correlation, lag  13     0.14368470699130775
Correlation, lag  14     -0.04303196541345698
Correlation, lag  15     -0.005491261700334031
Correlation, lag  16     -0.19846240061786916
Correlation, lag  17     -0.2132262843903361
Correlation, lag  18     -0.15721957959586197
Correlation, lag  19     -0.24932043294734885
Correlation, lag  20     -0.12181618792962301
Correlation, lag  21     -0.05554911240476115
Correlation, lag  22     -0.0767988660586777
Correlation, lag  23     0.17543519429568621
Correlation, lag  24     0.1899174217826425
Correlation, lag  25     0.10910339122620956
Correlation, lag  26     0.16529732772915795
Correlation, lag  27     0.11392321553758615
Correlation, lag  28     -0.14638196970590495
Correlation, lag  29     -0.2445631948008149
Correlation, lag  30     -0.20574862689100135
Correlation, lag  31     -0.22213505712945178
Correlation, lag  32     -0.20453089925382045
Correlation, lag  33     0.049946007284393974
Correlation, lag  34     -0.042676355186410324
Correlation, lag  35     0.04322332662660337
Correlation, lag  36     0.3555415069131859
period =  36      Maximum Correlation =  0.3555415069131859
In [30]:
cycleLen = period   # see prior cell for computation of cyclicality period
avg = []            # a list to store the average demand for each period of the cycle
numCycles = int(len(dfHSa)/cycleLen + 0.5)
for j in range(cycleLen):
    if j + (numCycles-1) * cycleLen < len(dfHSa):
        d = dfHSa.iloc[range(j,j + (numCycles-1) * cycleLen+1,cycleLen)]['R1']
        print(j,d)
        avg.append(sum(d)/len(d))
    else:
        d = dfHSa.iloc[range(j,j + (numCycles-2) * cycleLen+1,cycleLen)]['R1']
        print(j,d)
        avg.append(sum(d)/len(d))
dfHSa['C'] = np.NaN
for i in range(len(dfHSa)):
    dfHSa.loc[dfHSa.index[i], 'C'] = avg[i % cycleLen]

fig,ax = plt.subplots()
ax.plot(dfHSa['C'],label='Cyclic Pattern')
ax.plot(dfHSa['R1'],label='Remainder After Trend')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
0 2010-03-31   -186.566667
2013-03-31     75.000000
2016-03-31    -59.133333
Freq: 36M, Name: R1, dtype: float64
1 2010-04-30   -145.866667
2013-04-30    293.000000
2016-04-30      9.433333
Freq: 36M, Name: R1, dtype: float64
2 2010-05-31   -132.933333
2013-05-31     73.966667
2016-05-31     63.633333
Freq: 36M, Name: R1, dtype: float64
3 2010-06-30   -138.866667
2013-06-30     71.300000
2016-06-30     65.866667
Freq: 36M, Name: R1, dtype: float64
4 2010-07-31    126.733333
2013-07-31    -17.700000
2016-07-31      2.933333
Freq: 36M, Name: R1, dtype: float64
5 2010-08-31   -111.833333
2013-08-31    -35.366667
2016-08-31     28.900000
Freq: 36M, Name: R1, dtype: float64
6 2010-09-30   -36.866667
2013-09-30   -99.433333
2016-09-30    33.633333
Freq: 36M, Name: R1, dtype: float64
7 2010-10-31   -162.666667
2013-10-31    -96.400000
2016-10-31      7.866667
Freq: 36M, Name: R1, dtype: float64
8 2010-11-30   -216.2
2013-11-30   -215.5
2016-11-30     38.0
Freq: 36M, Name: R1, dtype: float64
9 2010-12-31   -133.800000
2013-12-31   -120.433333
2016-12-31     46.833333
Freq: 36M, Name: R1, dtype: float64
10 2011-01-31     -5.233333
2014-01-31   -219.233333
2017-01-31   -138.300000
Freq: 36M, Name: R1, dtype: float64
11 2011-02-28     81.000000
2014-02-28   -216.666667
2017-02-28   -170.900000
Freq: 36M, Name: R1, dtype: float64
12 2011-03-31    -97.466667
2014-03-31   -109.900000
2017-03-31     16.933333
Freq: 36M, Name: R1, dtype: float64
13 2011-04-30   -119.766667
2014-04-30    -23.633333
2017-04-30     -0.766667
Freq: 36M, Name: R1, dtype: float64
14 2011-05-31   -129.6
2014-05-31     46.8
2017-05-31     95.9
Freq: 36M, Name: R1, dtype: float64
15 2011-06-30    -39.600000
2014-06-30    -38.966667
2017-06-30    126.200000
Freq: 36M, Name: R1, dtype: float64
16 2011-07-31    15.733333
2014-07-31    -7.200000
2017-07-31   -38.866667
Freq: 36M, Name: R1, dtype: float64
17 2011-08-31   -83.366667
2014-08-31   -49.100000
2017-08-31     8.433333
Freq: 36M, Name: R1, dtype: float64
18 2011-09-30    -90.200000
2014-09-30   -110.366667
2017-09-30    -87.433333
Freq: 36M, Name: R1, dtype: float64
19 2011-10-31   -175.633333
2014-10-31    -83.466667
2017-10-31   -107.800000
Freq: 36M, Name: R1, dtype: float64
20 2011-11-30   -198.133333
2014-11-30   -168.500000
2017-11-30   -101.466667
Freq: 36M, Name: R1, dtype: float64
21 2011-12-31     62.300000
2014-12-31     20.533333
2017-12-31   -153.900000
Freq: 36M, Name: R1, dtype: float64
22 2012-01-31     65.600000
2015-01-31   -147.733333
2018-01-31   -180.000000
Freq: 36M, Name: R1, dtype: float64
23 2012-02-29   -174.633333
2015-02-28   -153.766667
2018-02-28    -88.533333
Freq: 36M, Name: R1, dtype: float64
24 2012-03-31    231.166667
2015-03-31    334.000000
2018-03-31     73.333333
Freq: 36M, Name: R1, dtype: float64
25 2012-04-30   -82.566667
2015-04-30    -5.000000
2018-04-30   -73.533333
Freq: 36M, Name: R1, dtype: float64
26 2012-05-31    40.966667
2015-05-31    72.900000
2018-05-31    -4.166667
Freq: 36M, Name: R1, dtype: float64
27 2012-06-30    119.600000
2015-06-30    111.733333
2018-06-30     22.133333
Freq: 36M, Name: R1, dtype: float64
28 2012-07-31   -50.433333
2015-07-31    50.100000
2018-07-31    -2.600000
Freq: 36M, Name: R1, dtype: float64
29 2012-08-31     56.600000
2015-08-31    -87.166667
2018-08-31   -291.500000
Freq: 36M, Name: R1, dtype: float64
30 2012-09-30    77.333333
2015-09-30   -44.566667
Freq: 36M, Name: R1, dtype: float64
31 2012-10-31   -101.033333
2015-10-31   -103.000000
Freq: 36M, Name: R1, dtype: float64
32 2012-11-30    -70.700000
2015-11-30   -129.633333
Freq: 36M, Name: R1, dtype: float64
33 2012-12-31   -25.366667
2015-12-31   -20.800000
Freq: 36M, Name: R1, dtype: float64
34 2013-01-31   -165.200000
2016-01-31   -145.133333
Freq: 36M, Name: R1, dtype: float64
35 2013-02-28   -177.833333
2016-02-29   -183.066667
Freq: 36M, Name: R1, dtype: float64
Out[30]:
<matplotlib.legend.Legend at 0x109cf5c0>
In [65]:
fig,ax = plt.subplots()
for i in range(len(cycles)):
    ax.plot(cycles[i].values,label='Cycle '+str(i),c='k')
ax.plot(avg,label='Average Cycle',c='r')
ax.set_xlabel('Month')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.legend()
Out[65]:
<matplotlib.legend.Legend at 0x128bd780>
In [32]:
dfHSa['R2'] = dfHSa['R1'] - dfHSa['C']
dfHSa['R2Error'] = abs(dfHSa['R2']/dfHSa['homeSales'])
dfHSa['fit'] = dfHSa['MovAvg'] + dfHSa['C']
print(dfHSa)
fig,ax = plt.subplots()
ax.plot(dfHSa['homeSales'],label='Home Sales')
ax.plot(dfHSa['fit'], label = 'Fit')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
print('Average Error: ', sum(dfHSa['R2Error']/len(dfHSa)))
           YearMonth  homeSales      MovAvg         sqErr          R1  \
2010-03-31   2010-03        527  713.566667  34807.121111 -186.566667   
2010-04-30   2010-04        556  701.866667  21277.084444 -145.866667   
2010-05-31   2010-05        565  697.933333  17671.271111 -132.933333   
2010-06-30   2010-06        551  689.866667  19283.951111 -138.866667   
2010-07-31   2010-07        818  691.266667  16061.337778  126.733333   
2010-08-31   2010-08        552  663.833333  12506.694444 -111.833333   
2010-09-30   2010-09        622  658.866667   1359.151111  -36.866667   
2010-10-31   2010-10        482  644.666667  26460.444444 -162.666667   
2010-11-30   2010-11        415  631.200000  46742.440000 -216.200000   
2010-12-31   2010-12        494  627.800000  17902.440000 -133.800000   
2011-01-31   2011-01        617  622.233333     27.387778   -5.233333   
2011-02-28   2011-02        704  623.000000   6561.000000   81.000000   
2011-03-31   2011-03        516  613.466667   9499.751111  -97.466667   
2011-04-30   2011-04        491  610.766667  14344.054444 -119.766667   
2011-05-31   2011-05        466  595.600000  16796.160000 -129.600000   
2011-06-30   2011-06        558  597.600000   1568.160000  -39.600000   
2011-07-31   2011-07        620  604.266667    247.537778   15.733333   
2011-08-31   2011-08        523  606.366667   6950.001111  -83.366667   
2011-09-30   2011-09        522  612.200000   8136.040000  -90.200000   
2011-10-31   2011-10        437  612.633333  30847.067778 -175.633333   
2011-11-30   2011-11        403  601.133333  39256.817778 -198.133333   
2011-12-31   2011-12        659  596.700000   3881.290000   62.300000   
2012-01-31   2012-01        663  597.400000   4303.360000   65.600000   
2012-02-29   2012-02        422  596.633333  30496.801111 -174.633333   
2012-03-31   2012-03        812  580.833333  53438.027778  231.166667   
2012-04-30   2012-04        508  590.566667   6817.254444  -82.566667   
2012-05-31   2012-05        624  583.033333   1678.267778   40.966667   
2012-06-30   2012-06        701  581.400000  14304.160000  119.600000   
2012-07-31   2012-07        529  579.433333   2543.521111  -50.433333   
2012-08-31   2012-08        629  572.400000   3203.560000   56.600000   
...              ...        ...         ...           ...         ...   
2016-03-31   2016-03        452  511.133333   3496.751111  -59.133333   
2016-04-30   2016-04        517  507.566667     88.987778    9.433333   
2016-05-31   2016-05        572  508.366667   4049.201111   63.633333   
2016-06-30   2016-06        577  511.133333   4338.417778   65.866667   
2016-07-31   2016-07        521  518.066667      8.604444    2.933333   
2016-08-31   2016-08        549  520.100000    835.210000   28.900000   
2016-09-30   2016-09        560  526.366667   1131.201111   33.633333   
2016-10-31   2016-10        541  533.133333     61.884444    7.866667   
2016-11-30   2016-11        574  536.000000   1444.000000   38.000000   
2016-12-31   2016-12        584  537.166667   2193.361111   46.833333   
2017-01-31   2017-01        398  536.300000  19126.890000 -138.300000   
2017-02-28   2017-02        361  531.900000  29206.810000 -170.900000   
2017-03-31   2017-03        542  525.066667    286.737778   16.933333   
2017-04-30   2017-04        525  525.766667      0.587778   -0.766667   
2017-05-31   2017-05        624  528.100000   9196.810000   95.900000   
2017-06-30   2017-06        659  532.800000  15926.440000  126.200000   
2017-07-31   2017-07        503  541.866667   1510.617778  -38.866667   
2017-08-31   2017-08        548  539.566667     71.121111    8.433333   
2017-09-30   2017-09        457  544.433333   7644.587778  -87.433333   
2017-10-31   2017-10        439  546.800000  11620.840000 -107.800000   
2017-11-30   2017-11        431  532.466667  10295.484444 -101.466667   
2017-12-31   2017-12        375  528.900000  23685.210000 -153.900000   
2018-01-31   2018-01        341  521.000000  32400.000000 -180.000000   
2018-02-28   2018-02        422  510.533333   7838.151111  -88.533333   
2018-03-31   2018-03        578  504.666667   5377.777778   73.333333   
2018-04-30   2018-04        435  508.533333   5407.151111  -73.533333   
2018-05-31   2018-05        502  506.166667     17.361111   -4.166667   
2018-06-30   2018-06        530  507.866667    489.884444   22.133333   
2018-07-31   2018-07        509  511.600000      6.760000   -2.600000   
2018-08-31   2018-08        220  511.500000  84972.250000 -291.500000   

             R1Error           C          R2   R2Error         fit  
2010-03-31  1.354016  -56.900000 -129.666667  0.246047  656.666667  
2010-04-30  1.262350   52.188889 -198.055556  0.356215  754.055556  
2010-05-31  1.235280    1.555556 -134.488889  0.238033  699.488889  
2010-06-30  1.252027   -0.566667 -138.300000  0.250998  689.300000  
2010-07-31  0.845069   37.322222   89.411111  0.109305  728.588889  
2010-08-31  1.202597  -39.433333  -72.400000  0.131159  624.400000  
2010-09-30  1.059271  -34.222222   -2.644444  0.004252  624.644444  
2010-10-31  1.337483  -83.733333  -78.933333  0.163762  560.933333  
2010-11-30  1.520964 -131.233333  -84.966667  0.204739  499.966667  
2010-12-31  1.270850  -69.133333  -64.666667  0.130904  558.666667  
2011-01-31  1.008482 -120.922222  115.688889  0.187502  501.311111  
2011-02-28  0.884943 -102.188889  183.188889  0.260211  520.811111  
2011-03-31  1.188889  -63.477778  -33.988889  0.065870  549.988889  
2011-04-30  1.243924  -48.055556  -71.711111  0.146051  562.711111  
2011-05-31  1.278112    4.366667 -133.966667  0.287482  599.966667  
2011-06-30  1.070968   15.877778  -55.477778  0.099423  613.477778  
2011-07-31  0.974624  -10.111111   25.844444  0.041685  594.155556  
2011-08-31  1.159401  -41.344444  -42.022222  0.080348  565.022222  
2011-09-30  1.172797  -96.000000    5.800000  0.011111  516.200000  
2011-10-31  1.401907 -122.300000  -53.333333  0.122044  490.333333  
2011-11-30  1.491646 -156.033333  -42.100000  0.104467  445.100000  
2011-12-31  0.905463  -23.688889   85.988889  0.130484  573.011111  
2012-01-31  0.901056  -87.377778  152.977778  0.230736  510.022222  
2012-02-29  1.413823 -138.977778  -35.655556  0.084492  457.655556  
2012-03-31  0.715312  212.833333   18.333333  0.022578  793.666667  
2012-04-30  1.162533  -53.700000  -28.866667  0.056824  536.866667  
2012-05-31  0.934348   36.566667    4.400000  0.007051  619.600000  
2012-06-30  0.829387   84.488889   35.111111  0.050087  665.888889  
2012-07-31  1.095337   -0.977778  -49.455556  0.093489  578.455556  
2012-08-31  0.910016 -107.355556  163.955556  0.260661  465.044444  
...              ...         ...         ...       ...         ...  
2016-03-31  1.130826  -56.900000   -2.233333  0.004941  454.233333  
2016-04-30  0.981754   52.188889  -42.755556  0.082699  559.755556  
2016-05-31  0.888753    1.555556   62.077778  0.108528  509.922222  
2016-06-30  0.885846   -0.566667   66.433333  0.115136  510.566667  
2016-07-31  0.994370   37.322222  -34.388889  0.066006  555.388889  
2016-08-31  0.947359  -39.433333   68.333333  0.124469  480.666667  
2016-09-30  0.939940  -34.222222   67.855556  0.121171  492.144444  
2016-10-31  0.985459  -83.733333   91.600000  0.169316  449.400000  
2016-11-30  0.933798 -131.233333  169.233333  0.294832  404.766667  
2016-12-31  0.919806  -69.133333  115.966667  0.198573  468.033333  
2017-01-31  1.347487 -120.922222  -17.377778  0.043663  415.377778  
2017-02-28  1.473407 -102.188889  -68.711111  0.190335  429.711111  
2017-03-31  0.968758  -63.477778   80.411111  0.148360  461.588889  
2017-04-30  1.001460  -48.055556   47.288889  0.090074  477.711111  
2017-05-31  0.846314    4.366667   91.533333  0.146688  532.466667  
2017-06-30  0.808498   15.877778  110.322222  0.167409  548.677778  
2017-07-31  1.077270  -10.111111  -28.755556  0.057168  531.755556  
2017-08-31  0.984611  -41.344444   49.777778  0.090835  498.222222  
2017-09-30  1.191320  -96.000000    8.566667  0.018745  448.433333  
2017-10-31  1.245558 -122.300000   14.500000  0.033030  424.500000  
2017-11-30  1.235422 -156.033333   54.566667  0.126605  376.433333  
2017-12-31  1.410400  -23.688889 -130.211111  0.347230  505.211111  
2018-01-31  1.527859  -87.377778  -92.622222  0.271619  433.622222  
2018-02-28  1.209795 -138.977778   50.444444  0.119537  371.555556  
2018-03-31  0.873126  212.833333 -139.500000  0.241349  717.500000  
2018-04-30  1.169042  -53.700000  -19.833333  0.045594  454.833333  
2018-05-31  1.008300   36.566667  -40.733333  0.081142  542.733333  
2018-06-30  0.958239   84.488889  -62.355556  0.117652  592.355556  
2018-07-31  1.005108   -0.977778   -1.622222  0.003187  510.622222  
2018-08-31  2.325000 -107.355556 -184.144444  0.837020  404.144444  

[102 rows x 10 columns]
Average Error:  0.11930379007161772
In [33]:
fig,ax = plt.subplots()
ax.plot(dfHSa['R2'], label='Remainder after Trend and Cyclic Components')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()

maxCorr = 0.0
period = np.NaN
for i in range(1,37):
    corr = dfHSa['R2'].autocorr(lag=i)
    print('Correlation, lag ',i,'   ',corr)
    if corr > maxCorr:
        maxCorr = corr
        period = i
print('period = ',period,'     Maximum Correlation = ',maxCorr)
Correlation, lag  1     0.36607601860455935
Correlation, lag  2     0.17717824059909718
Correlation, lag  3     0.05037594155915101
Correlation, lag  4     -0.03648114434725538
Correlation, lag  5     0.2193671767029301
Correlation, lag  6     0.12692528839724143
Correlation, lag  7     0.1508835980953447
Correlation, lag  8     0.0006065685997792922
Correlation, lag  9     -0.24362653546738522
Correlation, lag  10     -0.12024754148718543
Correlation, lag  11     0.05832273952647387
Correlation, lag  12     0.031088893201996426
Correlation, lag  13     -0.0720041886385924
Correlation, lag  14     -0.08212714387882643
Correlation, lag  15     -0.10420014558363878
Correlation, lag  16     -0.11210021206783544
Correlation, lag  17     -0.09169419951422685
Correlation, lag  18     -0.00996821665443028
Correlation, lag  19     -0.0933838301493063
Correlation, lag  20     -0.0913876975316234
Correlation, lag  21     -0.20885811421643155
Correlation, lag  22     -0.23445066360510794
Correlation, lag  23     -0.05212762841959355
Correlation, lag  24     -0.087907918793894
Correlation, lag  25     0.1516521604664605
Correlation, lag  26     0.13149249201971144
Correlation, lag  27     0.07867071661982397
Correlation, lag  28     0.024173786736891214
Correlation, lag  29     -0.2348559522105541
Correlation, lag  30     -0.09048921938682399
Correlation, lag  31     -0.03406999999849264
Correlation, lag  32     -0.051119537507533154
Correlation, lag  33     -0.08335371332606274
Correlation, lag  34     -0.23672100094957127
Correlation, lag  35     -0.1776581380167479
Correlation, lag  36     -0.41768962149245126
period =  1      Maximum Correlation =  0.36607601860455935
In [34]:
fig,ax = plt.subplots()
ax.plot(dfHSa['homeSales'],label='Home Sales')
ax.plot(dfHSa['fit'],label='Fit')
ax.plot(dfHSa['R2'],label='Residual')
ax.set_xlabel('Year')
ax.set_ylabel('Units of Demand')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
fig.legend()
Out[34]:
<matplotlib.legend.Legend at 0x10b3bef0>
In [ ]: