UK ENERGY¶

Business context¶

Energy is one of the main topics on the UN agenda for the following years, to assure global accessibility and reduce the related generation of pollution. According to the UN, energy currently provides 60% of the greenhouse gas emissions, although 13% of the global population has no access to electricity. For these reasons, countries like the UK are making efforts to create public policies focused on converting their current energy source to clean alternatives.

In order to understand the dynamics of residential energy consumption in large cities, in the year 2014, the UK Government hired UK Power Networks for a project focused on collecting information about energy production and consumption through smart meters installed in a selected group of London’s households. This information is useful to determine the characteristics of the current residential sector energy consumption. For UK Power Networks and the UK Government, it is important to know in detail the patterns of energy consumption in London’s households, to create strategies to ease the transition to clean energy sources.

Business problem¶

The challenge is to develop a prediction model to forecast the energy demand per household in London.


All the information about energy consumption will be analized within the ACORN Classification framework

Exploratory Data Analysis¶

Overview for the ACORN composition across the households in the study¶

ACORN is a consumer segmentation for population across the UK, here we have the 6 categories as follows:

  1. Affluent Achievers
  2. Rising Prosperity
  3. Comfortable Communities
  4. Financially Stretched
  5. Urban Adversity
  6. Not Priiivate Households

Main descriptive statistics for each ACORN category¶

Resume for the Affluent Achievers category:

Out[23]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00 193,940.00
mean 15.81 6.73 2,012.77 48.00 15.37 0.10 1.10 0.25 0.32 0.23
std 8.80 3.63 0.60 0.00 13.69 0.15 0.74 0.27 0.29 0.17
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 3.00 2,012.00 48.00 7.31 0.04 0.56 0.11 0.15 0.11
50% 16.00 7.00 2,013.00 48.00 11.84 0.07 0.97 0.17 0.25 0.19
75% 23.00 10.00 2,013.00 48.00 18.56 0.12 1.45 0.29 0.39 0.30
max 31.00 12.00 2,014.00 48.00 277.97 5.05 9.14 5.52 5.79 2.56

Resume for the Rising Prosperity category:

Out[24]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00 1,197,464.00
mean 15.81 6.70 2,012.69 48.00 10.89 0.06 0.88 0.17 0.23 0.19
std 8.79 3.51 0.61 0.00 10.70 0.10 0.74 0.20 0.22 0.18
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 4.00 2,012.00 48.00 4.50 0.02 0.34 0.06 0.09 0.07
50% 16.00 7.00 2,013.00 48.00 7.77 0.04 0.72 0.11 0.16 0.14
75% 23.00 10.00 2,013.00 48.00 13.44 0.07 1.21 0.20 0.28 0.25
max 31.00 12.00 2,014.00 48.00 332.56 6.39 10.76 6.91 6.93 3.35

Resume for the Comfortable Communities category:

Out[25]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00 925,449.00
mean 15.81 6.74 2,012.74 48.00 10.04 0.06 0.83 0.16 0.21 0.17
std 8.79 3.58 0.61 0.00 7.85 0.07 0.62 0.15 0.16 0.13
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 4.00 2,012.00 48.00 5.18 0.02 0.38 0.07 0.11 0.08
50% 16.00 7.00 2,013.00 48.00 8.34 0.04 0.71 0.12 0.17 0.14
75% 23.00 10.00 2,013.00 48.00 12.65 0.07 1.11 0.20 0.26 0.22
max 31.00 12.00 2,014.00 48.00 161.18 3.00 9.26 3.44 3.36 2.07

Resume for the Financially Stretched category:

Out[26]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00 462,914.00
mean 15.81 6.75 2,012.76 48.00 9.90 0.06 0.83 0.16 0.21 0.17
std 8.79 3.60 0.60 0.00 6.52 0.06 0.57 0.12 0.14 0.12
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 4.00 2,012.00 48.00 5.54 0.02 0.40 0.08 0.12 0.08
50% 16.00 7.00 2,013.00 48.00 8.56 0.04 0.72 0.13 0.18 0.14
75% 23.00 10.00 2,013.00 48.00 12.47 0.07 1.11 0.20 0.26 0.22
max 31.00 12.00 2,014.00 48.00 90.10 1.43 6.39 2.06 1.88 1.67

Resume for the Urban Adversity category:

Out[27]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00 657,311.00
mean 15.81 6.74 2,012.71 48.00 7.58 0.04 0.69 0.12 0.16 0.14
std 8.80 3.53 0.61 0.00 5.95 0.05 0.60 0.10 0.12 0.13
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 4.00 2,012.00 48.00 3.77 0.01 0.28 0.05 0.08 0.06
50% 16.00 7.00 2,013.00 48.00 6.09 0.03 0.52 0.09 0.13 0.10
75% 23.00 10.00 2,013.00 48.00 9.54 0.05 0.92 0.15 0.20 0.17
max 31.00 12.00 2,014.00 48.00 107.60 1.55 8.28 2.18 2.24 1.95

Resume for the Not Private Households category:

Out[28]:
  day_ month_ year_ energy_count energy_sum energy_min energy_max energy_median energy_mean energy_std
count 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00 29,158.00
mean 15.81 6.73 2,012.75 48.00 11.68 0.06 0.92 0.17 0.24 0.21
std 8.79 3.60 0.60 0.00 13.20 0.10 0.86 0.21 0.28 0.24
min 1.00 1.00 2,011.00 48.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 8.00 3.00 2,012.00 48.00 4.03 0.01 0.30 0.06 0.08 0.06
50% 16.00 7.00 2,013.00 48.00 7.40 0.03 0.75 0.10 0.15 0.15
75% 23.00 10.00 2,013.00 48.00 14.69 0.06 1.29 0.20 0.31 0.27
max 31.00 12.00 2,014.00 48.00 150.36 2.16 8.75 2.44 3.13 2.70

Energy consumption data exploration across the time¶

Next we can see the average daily consumption

Out[29]:
date_ day_ month_ year_ acorn_category acorn_group acorn_group_detail season q_households day_avg_consumption hour_avg_consumption
14454 2012-08-10 10 8 2012 Comfortable Communities I Comfortable Seniors summer 30 7.314300 0.304762
14455 2012-12-31 31 12 2012 Urban Adversity O Young Hardship winter 103 9.863252 0.410969
14456 2013-12-19 19 12 2013 Financially Stretched M Striving Families autumn 102 10.629510 0.442896
14457 2012-02-03 3 2 2012 Comfortable Communities J Starting Out winter 12 21.361417 0.890059
14458 2012-01-12 12 1 2012 Not Private Households R Not Private Households winter 5 11.593800 0.483075

It is clear how the general consumption rises to the end and the beginning of the year, exceeding the 12 kw/h per household and reaching maximums near of 18 kw/h. As well as it falls down to the months in the middle, showing consumptions from 8 kw/h to 11 hw/h.

To watch the differences between ACORN groups, next we will see violin-plots for the groups of each category, the goal here is to see the central tendency measures as well as distribution per group:

In general it is possible to see 2 concentrations of consumption within each group, there is a clear pattern for the 25 percentile, as well as the 75 percentile, showing seasonality that should be analized in deep according to the weather as we saw in the first plot of general consumption for all the analisys period.

Let's explore with a line-plot the consumption behavior for each group across the time:

As a confirmation for the general consumption, all the groups follow the same pattern across the time, which means there is an underlying variable (most likely the four seasons) for this behavior. It is interesting too that some groups appear to have a different media across the time, then, it is a good idea to perform a contrast t-test for means later.

Next we well see the consumption per group again, but this time segmented according to the season of the year:

The trend for each season is clear, in the previous charts we can see again some differences about the group consumption, then we well conduct a t-test contrast for some of the below information:

T-tests¶

We well start with the contrast for the means of the Aflluent Achievers groups this way:

(1) Contrast for the mean of Lavish Lifestyles against Executive Wealth
(2) Contrast for the mean of Lavish Lifestyles against Mature Money
(3) Contrast for the mean of Executive Wealth against Mature Money

/opt/conda/envs/Python-3.9/lib/python3.9/site-packages/pingouin/bayesian.py:152: RuntimeWarning:

divide by zero encountered in double_scalars

Out[34]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test 51.575088 1594.077318 two-sided 0.0 [7.77, 8.38] 2.577172 inf 1.0
Out[35]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test 47.960421 1610 two-sided 1.541591e-312 [6.26, 6.8] 2.389079 1.188e+308 1.0
Out[36]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -10.89023 1466.472968 two-sided 1.298696e-26 [-1.82, -1.26] 0.544904 2.565e+23 1.0

All the p-values for the previous contrasts are really close to zero, so we can conclude that the mean for all of them is different, no matter they are within the same category.

Next we well contrast the second category with a higher difference according to the previous charts, Rising Prosperity:

(1) Contrast for the mean of Career Climbers against City Sophisticates

Out[37]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -24.637607 1610 two-sided 5.378204e-114 [-3.3, -2.81] 1.227287 1.163e+110 1.0

Once again, all of the groups has a significant difference between them.

Now it is time for the Financially Stretched category:
(1) Contrast the mean of Poorer Pensioners against Student Life
(2) Contrast the mean of Poorer Pensioners against Modest Means
(3) Contrast the mean of Poorer Pensioners against Striving Families

Out[38]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -3.529813 1610 two-sided 0.000428 [-0.58, -0.17] 0.175833 26.246 0.941529
Out[39]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -5.544594 1610 two-sided 3.437766e-08 [-0.92, -0.44] 0.276196 2.014e+05 0.999829
Out[40]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -2.166112 1610 two-sided 0.030449 [-0.49, -0.02] 0.107902 0.57 0.581176

Although there are a significant difference between the 3 first groups, Poorer Pensioners, Student Life and Modest Means; the test for the Striving Families group is not as different since p-value is a higher by far than the other contrasts.

To the final category, the most important contrast is about the season of the year inside the the category:

Out[42]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test 20.874731 440.599838 two-sided 8.730930e-68 [3.71, 4.49] 1.79678 6.808e+69 1.0
Out[43]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -3.525825 370 two-sided 0.000475 [-1.15, -0.33] 0.365611 42.272 0.940229
Out[44]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -6.211734 431.045888 two-sided 1.235396e-09 [-1.83, -0.95] 0.583426 7.065e+06 1.0

As we saw in the general consumption chart and later in all of the plots per category and group; there is significant difference between the consumption across the year, then, it is mandatory to analyze the weather in deep and its impact on the energy consumption patterns.

Weather data analysis¶

Weather hourly¶

We will begin looking at the hourly behavior:

Range of pressure values is : [ 975.74  -  1043.32 ]. So the difference is:  68
Out[46]:
<pandas.core.indexes.accessors.DatetimeProperties object at 0x7f43f8730a60>
<Figure size 144x144 with 0 Axes>

Here we can see the temperature behavior across the year grouped by season. Next, we will analize possible correlations between weather variables.

Next we will see if there is a change in visibility across the time, it is important to understand whether there are some periods where people needs more light or not.

Weather daily¶

Now we will see change of weather in a daily period:

Out[50]:
temperature_max wind_bearing cloud_cover wind_speed pressure visibility uv_index temperature_min moon_phase
count 870.00 870.00 870.00 870.00 870.00 870.00 870.00 870.00 870.00
mean 13.68 196.39 0.48 3.58 1014.17 11.17 2.54 7.44 0.50
std 6.21 89.28 0.19 1.70 11.13 2.47 1.84 4.90 0.29
min -0.06 0.00 0.00 0.20 979.25 1.48 0.00 -5.64 0.00
25% 9.46 123.00 0.35 2.37 1007.44 10.36 1.00 3.72 0.25
50% 12.70 219.00 0.47 3.44 1014.65 11.97 2.00 7.10 0.49
75% 17.92 255.75 0.60 4.58 1021.81 12.83 4.00 11.37 0.75
max 32.40 359.00 1.00 9.96 1040.92 15.34 7.00 20.54 0.99

Next we will see some graphs to analize the weather through season:

Here we can see how most of the time there is a partly clouday day.

The most important correlation we can calculate, is about energy consumption and weather, so, let's join those data sets and compute a new correlation matrix:

Out[54]:
date_ day_avg_consumption temperature_min cloud_cover uv_index
0 2011-12-15 12.473702 4.08 0.42 1.0
1 2011-12-16 12.772750 1.80 0.70 1.0
2 2011-12-17 13.720379 0.24 0.37 1.0
3 2011-12-18 14.443739 -0.56 0.22 1.0
4 2011-12-19 12.883330 -0.84 0.47 1.0
... ... ... ... ... ...
801 2014-02-23 11.868262 8.67 0.66 1.0
802 2014-02-24 10.691051 7.99 0.50 1.0
803 2014-02-25 10.665313 6.79 0.62 1.0
804 2014-02-26 10.585619 4.17 0.26 2.0
805 2014-02-27 10.663729 3.93 0.32 2.0

806 rows × 5 columns

Out[55]:
Text(0.5, 1.0, 'Correlation matrix energy consumption vs metereological variables')

Here we can see the most important finding in weather analisys, a confirmation of a linear relationship between temperature and energy consumption. The visual representation for the negative correlation found in the previous chart.

EDA for hourly consumption¶

Comparing the average consumption for each hour of the day for the coldest vs the hottest day:

We can see how the behavior of weekdays is a little lower from 7am to 4pm versus weekend and holidays; in opposite, for the night hours the behavior is almost the same. To contrast this idea, the best approach is to perform a t-test:

The first comparison is between weekdays and weekends:

Out[62]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -26.09 174100.167 two-sided 0.0 [-0.02, -0.02] 0.101 1.92e+145 1.0

As we can see, the p-value is 0 and we can say there is significant difference between weekdays and weekends, as we saw before in the chart. Next we will contrast weekdays against holidays:

Out[63]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test -7.906 8223.725 two-sided 0.0 [-0.03, -0.02] 0.097 4.75e+11 1.0

Once again we can see there is significant difference as expected; now we will see the last contrast:

Out[64]:
T dof alternative p-val CI95% cohen-d BF10 power
T-test 0.48 9032.22 two-sided 0.631 [-0.0, 0.01] 0.006 0.015 0.077

As expected, weekends and holidays has almost the same behavior.