Monday, March 25, 2013

Thinking about Regions: Updated with R Code

Regional Industry Employment

The blogs have been slow this month - sorry people.  Things have been quite busy in my neck of the woods.  One of the things I am looking at at the moment though are three employment related questions based on Statistics New Zealand's Business Demographics data:
  1. What does the regional patten of industry employment and business size "look like" between the different geographic regions?
  2. Is there a story of "regional comparative advantage" in the data?
  3. How has the regional patter of employment and business size varied over time, especially in response to the Global Financial Crisis period of downturn?
Here's what I've got at the moment - its a work in progress thing :)  I'll keep updating this blog until I'm happy with it.  Also, I'd like New Zealand Economists to use more R in their lives.  As a result, I'm posting the R code and the source data at the bottom of this post.

Fig1. Regional Employment by Industry and Year

Fig2.  Regional Employment by Industry and Year - Selected Regional Councils

From question 1, I'm looking at what proportion of the workforce employed within geographical regional council area are employed within the different industries. For example, looking at Fig 2 first, Public Services stands out as a large industry source of employment for Wellington - no surprises there (and that's what I want).

My goal is to be able to express each of these regionals in terms of 'similarity to' and 'difference from' other regions.  For example, Auckland and Wellington are similar to each other in Agriculture, Forestry and Fishing and Retail Trade, but different from Canterbury and Waikato.  Auckland, Christchurch and Waikato are similar to each other in Manufacturing, but different from Wellington.

Which leads to the second question: do Canterbury and Waikato have a comparative advantage in the production of Agriculture, Forestry and Fishing and Retail Trade industry commodities over the production of those same commodities in Auckland and Wellington?  If they do have a comparative advantage, which has revealed itself in different employment shares regionally within each industry, then what does this mean for their sensitivity to factors outside of their control and world market related?  For example, in Fig 1, in Mining, that top line is the West Coast, which according to Tony Ryall has its coal production currently being hit by the 'perfect storm' from international coal price decline.

Thirdly, how has each industry in each region fared over the long term, especially since the Global Financial Crisis (GFC) era of economic decline.  The start of the GFC is usually attributed to September 2008, with the fall of Leyman Brothers leading to a credit shortage which ultimately slowed economic production.  However, in New Zealand's case, the decline in economic growth started much earlier, with economic growth decline and increases in the unemployment rate evident from December 2007 on.















What I'm hoping to see in the Fig1 above is how regionally and industrially the economic decline from December 2007 on manifested within the regions.

Principal Components on Regional Industry Breakdown

First off, which regions are "similar" and which are not?  Given the different industry employment profiles, principal component analysis (PCA) might be one way to simplify the different regional industry employment compositions into measures which reflect regional similarity.

PCA is a statistical technical which decomposes the variations in multiple measures on something into its 'principal' 'components'.  It is a variable reduction technique which reduces a large number of variables down into a few key variables which, when the technique works well, describe the bulk of the variation occurring within the multiple measured variables.  For example, if there are 20 variables reflecting some separate aspect of the thing measured and which are highly correlated, then PCA might reduce the information content of those 20 variables into 2 -3 'principal components' which explain the bulk of the variations within the data.

The 'principal components' are weights given to each measured variable which  together discriminate between the measured variables according to some dimension within the data. The technique works best when the multiple measurements made on the single thing are highly correlated.  For highly correlated measurement data, large sources of variation within the multiple measure data can normally be described by the first/second principal components.  The beauty of PCA is each component is 'orthogonal' to each other: it captures some source of variation within the data which is completely and utterly separate from the sources of variation captured within other principal components.  Usually, the variation captured by each PCA can be given 'meaning' and interpretative content, although the technique never identifies what dimension it is actually capturing.  There's an element of interpretion in figuring out what each component means.

Principal Components Analysis:  Results

I've run PCA over the regional industry employment breakdowns for the 2000 year.  The thing measured is each region.  The multiple different measures are the industry employment proportions.  There are as many components as there are regions (16).

Table 1:  Principal Components Results


PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16
Trans_Warehousing -0.24 0.07 -0.43 -0.01 0.05 0.26 -0.02 0.16 -0.19 -0.07 -0.44 0.42 -0.15 0.19 0.01 -0.16
Hire_RealEstate 0.00 -0.34 0.10 0.13 0.54 -0.01 0.35 0.13 -0.01 0.03 -0.02 -0.09 0.00 0.54 -0.04 0.31
Wholesale -0.30 -0.14 -0.07 0.13 0.25 0.23 -0.38 -0.13 0.31 -0.16 -0.04 -0.05 0.37 -0.08 0.50 0.16
Other_Services -0.33 0.14 0.08 0.14 -0.07 0.20 0.08 0.06 -0.50 0.20 -0.04 -0.40 0.34 0.18 0.00 -0.22
Arts_Rec -0.12 0.32 0.19 -0.30 0.26 -0.10 -0.04 -0.41 -0.33 -0.04 -0.26 -0.27 -0.18 -0.17 0.10 0.26
Mining 0.12 0.32 -0.16 -0.32 0.07 -0.23 -0.37 0.07 0.27 0.51 -0.02 -0.10 0.07 0.44 0.03 0.02
Accom_Food 0.08 0.31 -0.22 -0.35 0.26 -0.10 0.26 -0.05 -0.15 -0.41 0.38 0.20 0.16 0.12 0.09 -0.05
Admin_Support -0.32 -0.12 0.07 -0.19 0.32 0.02 0.09 -0.20 0.38 -0.02 0.05 -0.11 -0.30 -0.11 -0.17 -0.32
Health_Social -0.07 0.33 -0.23 0.24 -0.24 0.34 0.30 -0.21 0.17 0.15 0.21 -0.05 -0.35 0.09 0.22 0.41
Retail 0.05 0.34 -0.16 0.29 0.37 -0.06 0.28 0.21 0.18 0.27 -0.08 -0.04 0.28 -0.47 -0.22 0.02
Agri_Forest_Fishing 0.30 -0.25 0.20 -0.14 -0.04 0.14 -0.01 0.07 -0.15 0.16 0.07 0.03 0.04 -0.13 0.07 0.31
Professional_Science -0.39 -0.05 0.09 0.00 0.00 0.04 -0.11 0.10 -0.01 0.15 0.15 -0.05 -0.34 0.03 -0.30 0.03
Construction 0.16 0.31 0.16 0.07 0.16 0.25 -0.32 0.56 0.03 -0.41 -0.03 -0.25 -0.25 0.00 -0.08 0.11
Financial_Insurance -0.37 -0.01 0.09 -0.20 -0.01 -0.15 -0.07 0.19 -0.10 0.12 -0.08 0.43 0.09 -0.18 -0.13 0.50
Manufacturing 0.07 -0.11 -0.41 0.34 -0.03 -0.47 -0.17 -0.21 -0.09 -0.24 -0.24 -0.22 -0.09 0.04 -0.21 0.22
Info_Telecom -0.36 0.01 -0.13 0.08 -0.13 -0.21 -0.14 0.12 -0.06 -0.15 0.56 -0.10 0.13 0.04 -0.14 0.12
Public_Services -0.23 0.08 0.19 -0.16 -0.37 -0.30 0.39 0.28 0.33 -0.22 -0.33 -0.19 0.09 0.09 0.21 0.05
Education 0.02 0.29 0.42 0.22 -0.08 0.14 -0.08 -0.36 0.19 -0.19 -0.10 0.28 0.27 0.29 -0.43 0.06
Utilities -0.05 0.18 0.34 0.44 0.15 -0.41 -0.07 0.06 -0.14 0.11 0.10 0.31 -0.26 0.05 0.43 -0.20

















Importance of components














Standard deviation 2.51 2.00 1.59 1.33 1.13 0.95 0.86 0.71 0.63 0.51 0.44 0.28 0.22 0.18 0.07 0.00
Proportion of Variance 33% 21% 13% 9% 7% 5% 4% 3% 2% 1% 1% 0.4% 0.2% 0.2% 0.0% 0.0%
Cumulative Proportion33%54%67%77%83%88%92%95%97%98%99% 100% 100% 100% 100% 100%

The first principal component explains 33% of the variation in regional industry employment.  The second principal component explains 21% of the variation in regional industry employment.  Over 92% of the regional employment variation between industries is explained by the first 7 principal components.

Interpreting the Principal Components

There's a bit of art in figuring out what intepretation ought to be given to the variation captured within each component.  One approach is to evaluate both the weighing size and directions and see if, in their totality, they have some interpretation.  For example, in the PC1 Agriculture followed distantly by construction and mining hav the highest positive weighting (in bold red).  On the flipside, Professional Science / Finance_Insurance, and Information and Telecommunication industries are highly negative (in bold blue).

Conceivably, the first principal component dimension distinguishes regions of differing "market depth" and "industry complexity".

As regional markets grow, they increase in complexity. Small regions are predominately primary industry and agriculture.  Telecom and Telstra-clear have few offices in downtown Ashburton.  As regions develop, manufacturing and light industry develops and manufacturing employment grows.  As regionals develop into metropolitarian areas, professional services develop, service industries mature, and hardly anyone is employed in agriculture any more.  PG Wrightsons have few offices in Wellington.  This dynamic seems to be captured in the PC1.

The second principal component strongly negative weights agriculture and real estate industry employment proportions.  On the positive side, PC2 strongly weights Retail, Health,  Arts and Recreation, Mining, Construction and Education.  The second principal component interpretation is more tricky, but it looks to distinguishes regions differing between "prodominately service" and "prodominately agriculture".

This one's more complex, and I'd welcome some comments on this, but check out what happens when you graph the regions by principal components 1 and 2.  PC2 strongly differentiates Tasman from the West Coast regions. From Table 2 below, the West Coast/Otago and Northland regions have larger proportions of service industry employment, but I can't explain why Auckland features highly negative on PC2.
Fig 5:  Principal Components Analysis

Table 2:  Regional Industry Employment - Coloured by Second Principal Component Dimensions



West Coast Otago Northland Tasman Hawke's Bay Auckland
Trans_Warehousing 5% 4% 4% 2% 4% 6%
Hire_RealEstate 1% 1% 2% 2% 1% 2%
Wholesale 2% 3% 3% 3% 4% 9%
Other_Services 3% 3% 3% 2% 2% 3%
Arts_Rec 2% 2% 2% 1% 1% 1%
Mining 4% 0% 0% 0% 0% 0%
Accom_Food 12% 10% 7% 6% 5% 6%
Admin_Support 3% 3% 3% 4% 3% 6%
Health_Social 12% 12% 12% 4% 10% 8%
Retail 12% 11% 13% 10% 10% 11%
Agri_Forest_Fishing 9% 9% 10% 36% 19% 1%
Professional_Science 2% 4% 4% 3% 3% 7%
Construction 6% 5% 5% 4% 4% 5%
Financial_Insurance 1% 2% 2% 1% 1% 4%
Manufacturing 12% 14% 13% 13% 17% 16%
Info_Telecom 1% 2% 2% 0% 1% 4%
Public_Services 4% 4% 4% 2% 4% 4%
Education 8% 11% 11% 7% 8% 7%
Utilities 0% 1% 1% 0% 0% 1%





SOURCE CODE AND DATA FROM HERE DOWN





The following data is derived from Statistics New Zealand Business Demographics data.

http://www.stats.govt.nz/infoshare/  => Businesses =>  Business Demographic Statistics - BUD  =>  Employee count by Region 2011, ANZSIC and Size Group (ANZSIC06) (Annual-Feb)  => select all variables, all time periods, everything and re-arrange variable order like below.