Energy is one of the main topics on the UN agenda for the following years, to assure global accessibility and reduce the related generation of pollution. According to the UN, energy currently provides 60% of the greenhouse gas emissions, although 13% of the global population has no access to electricity. For these reasons, countries like the UK are making efforts to create public policies focused on converting their current energy source to clean alternatives.
In order to understand the dynamics of residential energy consumption in large cities, in the year 2014, the UK Government hired UK Power Networks for a project focused on collecting information about energy production and consumption through smart meters installed in a selected group of London’s households. This information is useful to determine the characteristics of the current residential sector energy consumption. For UK Power Networks and the UK Government, it is important to know in detail the patterns of energy consumption in London’s households, to create strategies to ease the transition to clean energy sources.
The challenge is to develop a prediction model to forecast the energy demand per household in London.
All the information about energy consumption will be analized within the ACORN Classification framework
ACORN is a consumer segmentation for population across the UK, here we have the 6 categories as follows:
Resume for the Affluent Achievers category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 | 193,940.00 |
mean | 15.81 | 6.73 | 2,012.77 | 48.00 | 15.37 | 0.10 | 1.10 | 0.25 | 0.32 | 0.23 |
std | 8.80 | 3.63 | 0.60 | 0.00 | 13.69 | 0.15 | 0.74 | 0.27 | 0.29 | 0.17 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 3.00 | 2,012.00 | 48.00 | 7.31 | 0.04 | 0.56 | 0.11 | 0.15 | 0.11 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 11.84 | 0.07 | 0.97 | 0.17 | 0.25 | 0.19 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 18.56 | 0.12 | 1.45 | 0.29 | 0.39 | 0.30 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 277.97 | 5.05 | 9.14 | 5.52 | 5.79 | 2.56 |
Resume for the Rising Prosperity category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 | 1,197,464.00 |
mean | 15.81 | 6.70 | 2,012.69 | 48.00 | 10.89 | 0.06 | 0.88 | 0.17 | 0.23 | 0.19 |
std | 8.79 | 3.51 | 0.61 | 0.00 | 10.70 | 0.10 | 0.74 | 0.20 | 0.22 | 0.18 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 4.00 | 2,012.00 | 48.00 | 4.50 | 0.02 | 0.34 | 0.06 | 0.09 | 0.07 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 7.77 | 0.04 | 0.72 | 0.11 | 0.16 | 0.14 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 13.44 | 0.07 | 1.21 | 0.20 | 0.28 | 0.25 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 332.56 | 6.39 | 10.76 | 6.91 | 6.93 | 3.35 |
Resume for the Comfortable Communities category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 | 925,449.00 |
mean | 15.81 | 6.74 | 2,012.74 | 48.00 | 10.04 | 0.06 | 0.83 | 0.16 | 0.21 | 0.17 |
std | 8.79 | 3.58 | 0.61 | 0.00 | 7.85 | 0.07 | 0.62 | 0.15 | 0.16 | 0.13 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 4.00 | 2,012.00 | 48.00 | 5.18 | 0.02 | 0.38 | 0.07 | 0.11 | 0.08 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 8.34 | 0.04 | 0.71 | 0.12 | 0.17 | 0.14 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 12.65 | 0.07 | 1.11 | 0.20 | 0.26 | 0.22 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 161.18 | 3.00 | 9.26 | 3.44 | 3.36 | 2.07 |
Resume for the Financially Stretched category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 | 462,914.00 |
mean | 15.81 | 6.75 | 2,012.76 | 48.00 | 9.90 | 0.06 | 0.83 | 0.16 | 0.21 | 0.17 |
std | 8.79 | 3.60 | 0.60 | 0.00 | 6.52 | 0.06 | 0.57 | 0.12 | 0.14 | 0.12 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 4.00 | 2,012.00 | 48.00 | 5.54 | 0.02 | 0.40 | 0.08 | 0.12 | 0.08 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 8.56 | 0.04 | 0.72 | 0.13 | 0.18 | 0.14 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 12.47 | 0.07 | 1.11 | 0.20 | 0.26 | 0.22 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 90.10 | 1.43 | 6.39 | 2.06 | 1.88 | 1.67 |
Resume for the Urban Adversity category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 | 657,311.00 |
mean | 15.81 | 6.74 | 2,012.71 | 48.00 | 7.58 | 0.04 | 0.69 | 0.12 | 0.16 | 0.14 |
std | 8.80 | 3.53 | 0.61 | 0.00 | 5.95 | 0.05 | 0.60 | 0.10 | 0.12 | 0.13 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 4.00 | 2,012.00 | 48.00 | 3.77 | 0.01 | 0.28 | 0.05 | 0.08 | 0.06 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 6.09 | 0.03 | 0.52 | 0.09 | 0.13 | 0.10 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 9.54 | 0.05 | 0.92 | 0.15 | 0.20 | 0.17 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 107.60 | 1.55 | 8.28 | 2.18 | 2.24 | 1.95 |
Resume for the Not Private Households category:
day_ | month_ | year_ | energy_count | energy_sum | energy_min | energy_max | energy_median | energy_mean | energy_std | |
---|---|---|---|---|---|---|---|---|---|---|
count | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 | 29,158.00 |
mean | 15.81 | 6.73 | 2,012.75 | 48.00 | 11.68 | 0.06 | 0.92 | 0.17 | 0.24 | 0.21 |
std | 8.79 | 3.60 | 0.60 | 0.00 | 13.20 | 0.10 | 0.86 | 0.21 | 0.28 | 0.24 |
min | 1.00 | 1.00 | 2,011.00 | 48.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
25% | 8.00 | 3.00 | 2,012.00 | 48.00 | 4.03 | 0.01 | 0.30 | 0.06 | 0.08 | 0.06 |
50% | 16.00 | 7.00 | 2,013.00 | 48.00 | 7.40 | 0.03 | 0.75 | 0.10 | 0.15 | 0.15 |
75% | 23.00 | 10.00 | 2,013.00 | 48.00 | 14.69 | 0.06 | 1.29 | 0.20 | 0.31 | 0.27 |
max | 31.00 | 12.00 | 2,014.00 | 48.00 | 150.36 | 2.16 | 8.75 | 2.44 | 3.13 | 2.70 |
Next we can see the average daily consumption
date_ | day_ | month_ | year_ | acorn_category | acorn_group | acorn_group_detail | season | q_households | day_avg_consumption | hour_avg_consumption | |
---|---|---|---|---|---|---|---|---|---|---|---|
14454 | 2012-08-10 | 10 | 8 | 2012 | Comfortable Communities | I | Comfortable Seniors | summer | 30 | 7.314300 | 0.304762 |
14455 | 2012-12-31 | 31 | 12 | 2012 | Urban Adversity | O | Young Hardship | winter | 103 | 9.863252 | 0.410969 |
14456 | 2013-12-19 | 19 | 12 | 2013 | Financially Stretched | M | Striving Families | autumn | 102 | 10.629510 | 0.442896 |
14457 | 2012-02-03 | 3 | 2 | 2012 | Comfortable Communities | J | Starting Out | winter | 12 | 21.361417 | 0.890059 |
14458 | 2012-01-12 | 12 | 1 | 2012 | Not Private Households | R | Not Private Households | winter | 5 | 11.593800 | 0.483075 |
It is clear how the general consumption rises to the end and the beginning of the year, exceeding the 12 kw/h per household and reaching maximums near of 18 kw/h. As well as it falls down to the months in the middle, showing consumptions from 8 kw/h to 11 hw/h.
To watch the differences between ACORN groups, next we will see violin-plots for the groups of each category, the goal here is to see the central tendency measures as well as distribution per group:
In general it is possible to see 2 concentrations of consumption within each group, there is a clear pattern for the 25 percentile, as well as the 75 percentile, showing seasonality that should be analized in deep according to the weather as we saw in the first plot of general consumption for all the analisys period.
Let's explore with a line-plot the consumption behavior for each group across the time:
As a confirmation for the general consumption, all the groups follow the same pattern across the time, which means there is an underlying variable (most likely the four seasons) for this behavior. It is interesting too that some groups appear to have a different media across the time, then, it is a good idea to perform a contrast t-test for means later.
Next we well see the consumption per group again, but this time segmented according to the season of the year:
The trend for each season is clear, in the previous charts we can see again some differences about the group consumption, then we well conduct a t-test contrast for some of the below information:
We well start with the contrast for the means of the Aflluent Achievers groups this way:
(1) Contrast for the mean of Lavish Lifestyles against Executive Wealth
(2) Contrast for the mean of Lavish Lifestyles against Mature Money
(3) Contrast for the mean of Executive Wealth against Mature Money
/opt/conda/envs/Python-3.9/lib/python3.9/site-packages/pingouin/bayesian.py:152: RuntimeWarning: divide by zero encountered in double_scalars
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | 51.575088 | 1594.077318 | two-sided | 0.0 | [7.77, 8.38] | 2.577172 | inf | 1.0 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | 47.960421 | 1610 | two-sided | 1.541591e-312 | [6.26, 6.8] | 2.389079 | 1.188e+308 | 1.0 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -10.89023 | 1466.472968 | two-sided | 1.298696e-26 | [-1.82, -1.26] | 0.544904 | 2.565e+23 | 1.0 |
All the p-values for the previous contrasts are really close to zero, so we can conclude that the mean for all of them is different, no matter they are within the same category.
Next we well contrast the second category with a higher difference according to the previous charts, Rising Prosperity:
(1) Contrast for the mean of Career Climbers against City Sophisticates
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -24.637607 | 1610 | two-sided | 5.378204e-114 | [-3.3, -2.81] | 1.227287 | 1.163e+110 | 1.0 |
Once again, all of the groups has a significant difference between them.
Now it is time for the Financially Stretched category:
(1) Contrast the mean of Poorer Pensioners against Student Life
(2) Contrast the mean of Poorer Pensioners against Modest Means
(3) Contrast the mean of Poorer Pensioners against Striving Families
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -3.529813 | 1610 | two-sided | 0.000428 | [-0.58, -0.17] | 0.175833 | 26.246 | 0.941529 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -5.544594 | 1610 | two-sided | 3.437766e-08 | [-0.92, -0.44] | 0.276196 | 2.014e+05 | 0.999829 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -2.166112 | 1610 | two-sided | 0.030449 | [-0.49, -0.02] | 0.107902 | 0.57 | 0.581176 |
Although there are a significant difference between the 3 first groups, Poorer Pensioners, Student Life and Modest Means; the test for the Striving Families group is not as different since p-value is a higher by far than the other contrasts.
To the final category, the most important contrast is about the season of the year inside the the category:
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | 20.874731 | 440.599838 | two-sided | 8.730930e-68 | [3.71, 4.49] | 1.79678 | 6.808e+69 | 1.0 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -3.525825 | 370 | two-sided | 0.000475 | [-1.15, -0.33] | 0.365611 | 42.272 | 0.940229 |
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -6.211734 | 431.045888 | two-sided | 1.235396e-09 | [-1.83, -0.95] | 0.583426 | 7.065e+06 | 1.0 |
As we saw in the general consumption chart and later in all of the plots per category and group; there is significant difference between the consumption across the year, then, it is mandatory to analyze the weather in deep and its impact on the energy consumption patterns.
We will begin looking at the hourly behavior:
Range of pressure values is : [ 975.74 - 1043.32 ]. So the difference is: 68
<pandas.core.indexes.accessors.DatetimeProperties object at 0x7f43f8730a60>
<Figure size 144x144 with 0 Axes>
Here we can see the temperature behavior across the year grouped by season. Next, we will analize possible correlations between weather variables.
Next we will see if there is a change in visibility across the time, it is important to understand whether there are some periods where people needs more light or not.
Now we will see change of weather in a daily period:
temperature_max | wind_bearing | cloud_cover | wind_speed | pressure | visibility | uv_index | temperature_min | moon_phase | |
---|---|---|---|---|---|---|---|---|---|
count | 870.00 | 870.00 | 870.00 | 870.00 | 870.00 | 870.00 | 870.00 | 870.00 | 870.00 |
mean | 13.68 | 196.39 | 0.48 | 3.58 | 1014.17 | 11.17 | 2.54 | 7.44 | 0.50 |
std | 6.21 | 89.28 | 0.19 | 1.70 | 11.13 | 2.47 | 1.84 | 4.90 | 0.29 |
min | -0.06 | 0.00 | 0.00 | 0.20 | 979.25 | 1.48 | 0.00 | -5.64 | 0.00 |
25% | 9.46 | 123.00 | 0.35 | 2.37 | 1007.44 | 10.36 | 1.00 | 3.72 | 0.25 |
50% | 12.70 | 219.00 | 0.47 | 3.44 | 1014.65 | 11.97 | 2.00 | 7.10 | 0.49 |
75% | 17.92 | 255.75 | 0.60 | 4.58 | 1021.81 | 12.83 | 4.00 | 11.37 | 0.75 |
max | 32.40 | 359.00 | 1.00 | 9.96 | 1040.92 | 15.34 | 7.00 | 20.54 | 0.99 |
Next we will see some graphs to analize the weather through season:
Here we can see how most of the time there is a partly clouday day.
The most important correlation we can calculate, is about energy consumption and weather, so, let's join those data sets and compute a new correlation matrix:
date_ | day_avg_consumption | temperature_min | cloud_cover | uv_index | |
---|---|---|---|---|---|
0 | 2011-12-15 | 12.473702 | 4.08 | 0.42 | 1.0 |
1 | 2011-12-16 | 12.772750 | 1.80 | 0.70 | 1.0 |
2 | 2011-12-17 | 13.720379 | 0.24 | 0.37 | 1.0 |
3 | 2011-12-18 | 14.443739 | -0.56 | 0.22 | 1.0 |
4 | 2011-12-19 | 12.883330 | -0.84 | 0.47 | 1.0 |
... | ... | ... | ... | ... | ... |
801 | 2014-02-23 | 11.868262 | 8.67 | 0.66 | 1.0 |
802 | 2014-02-24 | 10.691051 | 7.99 | 0.50 | 1.0 |
803 | 2014-02-25 | 10.665313 | 6.79 | 0.62 | 1.0 |
804 | 2014-02-26 | 10.585619 | 4.17 | 0.26 | 2.0 |
805 | 2014-02-27 | 10.663729 | 3.93 | 0.32 | 2.0 |
806 rows × 5 columns
Text(0.5, 1.0, 'Correlation matrix energy consumption vs metereological variables')
Here we can see the most important finding in weather analisys, a confirmation of a linear relationship between temperature and energy consumption. The visual representation for the negative correlation found in the previous chart.
Comparing the average consumption for each hour of the day for the coldest vs the hottest day:
We can see how the behavior of weekdays is a little lower from 7am to 4pm versus weekend and holidays; in opposite, for the night hours the behavior is almost the same. To contrast this idea, the best approach is to perform a t-test:
The first comparison is between weekdays and weekends:
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -26.09 | 174100.167 | two-sided | 0.0 | [-0.02, -0.02] | 0.101 | 1.92e+145 | 1.0 |
As we can see, the p-value is 0 and we can say there is significant difference between weekdays and weekends, as we saw before in the chart. Next we will contrast weekdays against holidays:
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | -7.906 | 8223.725 | two-sided | 0.0 | [-0.03, -0.02] | 0.097 | 4.75e+11 | 1.0 |
Once again we can see there is significant difference as expected; now we will see the last contrast:
T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
---|---|---|---|---|---|---|---|---|
T-test | 0.48 | 9032.22 | two-sided | 0.631 | [-0.0, 0.01] | 0.006 | 0.015 | 0.077 |
As expected, weekends and holidays has almost the same behavior.