Principal component analysis (PCA) of human security
AbstractThis research is a continuation of the work (Zgurovsky M.Z., Gvishiani A.D., 2008), in which the list of ten most essential global threats to the future of mankind have been presented. The initial data on each threat are taken from the respectable international organizations data bases. Then, we defined the summarized impact of the examined ten global threats totality on different countries based on cluster analysis method with the purpose of selecting groups of the countries with "close" performances of summarized threats. By using the Minkovsky type metric the foresight of the future global conflicting has been executed. To facilitate the analysis and make it easier we use the method of Principal Component Analysis (PCA) which allows reduce variables with many properties to several hidden factors. The analysis shows that currently the most considerable threats for most countries are the reduction of energy security, worsening of balance between bio capacity and human demands and the incomes inequality between people and countries. KeywordsGlobal threats, global conflicting, Minkovsky metric, cluster analysis, principal component analysis, energy security, bio capacity, incomes inequality. 1. IntroductionIn the work (Zgurovsky M.Z., Gvishiani A. G., 2008) the impact of system world conflicts on sustainable development is studied in the global context. On the basis of data analysis pertaining to the global conflicts taking place from 705 B.C. till now the regularity of their flow is determined. It is shown that the sequence of life cycles of system world conflicts is subordinate to the law of Fibonacci series, and the intensity of these conflicts, depending on a level of technological evolution of a society, builds up under the hyperbolic law. By using the revealed regularities we attempt to foresee the upcoming world conflict, called “the conflict of the XXI century” and analyze its nature and principal performances: - durations, main phases of the flow and intensity. The totality of main global threats generating the conflict of the XXI century is given. These global threats are: ES – Energy Security; FB – Footprint and Biocapacity Balance; GINI – Incomes Inequality; GD – Global Diseases; CM – Child Mortality; CP – Corruption Perception; WA – Water Access; GW – Global Warming; SF – State Fragility; ND – Natural Disasters. By the cluster analysis method we define the impact of the above threats on different countries and on twelve large groups of countries (civilizations according to Huntington) combined by common culture features. Assumptions are made as to possible scenarios in the course of the conflict of the XXI century and after its termination. Since it is difficult to analyze the security of this or that country simultaneously in the space of ten global threats, to make the research more convenient and demonstrative we use the Principal Component Analysis (PCA). This method makes it possible to reduce analysis of many properties to some hidden factors determining these properties. In this case the security of a country may be presented in a simplified form not by all ten global threats, but some most significant factors. 2. Application of the principal component method for the analysis of the impact of global threats totality on sustainable developmentThe example of sustainable development global simulation (System Analysis and Decisions, The example of sustainable development global simulation, 2009) presents global threats and degree of their impact on different countries. Let us format table 1 in the form of the initial data matrix, The purpose of the given study conducted with application of the principal component method is finding out and interpreting latent common factors with simultaneous goal to minimize both their number and the degree of dependence The expression for
where Searching of principal components is reduced to finding the matrix decomposition Matrix of scores assigns a set of vectors Defining principal components is connected with calculation of eigenvectors of the covariance matrix (Lindsay I. Smith, 2002) and (Strang, Gilbert, 2006),defined as:
where For selection of sufficient number
where Preliminary analysis of principal components is given in Table 1. Table 1. Analysis of principal components
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We shall define the sufficient number of principal components by using the “slide rocks” criterion suggested by (Cattell, R. B, 1966). "Slide rocks" is a geological term to define rock debris accumulated in the lower part of a rocky slope. Using this analogy it is possible to show graphically (Figure 1) the eigenvalues presented in Table 1. It is necessary to find such a place in the plot where a decrease of eigenvalues left to right is maximally slow. It is supposed that to the right from this point only “factorial slide rocks” are located. In accordance with this criterion only 2 or 3 factors may be left. As seen from the above presented data it is sufficient to use three first principal components (the eigenvalues corresponding to them are indicated in red) to represent the data variability higher than 74 %. Definition of factor loadingsNow let us analyze principal components and consider solving a problem with three factors. For this we consider correlations between threats and factors (or “new” variables) which are calculated by the formula (Harman H.H, 1966):
where
The correlation coefficient itself does not have informal interpretation. However, its square called the coefficient of determination shows to what extent variations of dependent characteristics may be explained by variations of an independent one. It is thought that correlation coefficients which by their module are more that 0.7 indicate a strong connection (in this case coefficients of determination > 50%, i.e. one characterististics determines the other more than by half. Correlation coefficients which by their module are less that 0.7, but more than 0.5 indicate that connection is average (in this case the coefficients of determination are less than 50%, but more than 25%). At last, correlation coefficients which by their module are less than 0.5 indicate a weak connection (here the coefficients of determination are less than 25 %). Table 2 shows the values of correlation coefficients between principal factors and initial threats. The coefficients corresponding to strong connections are indicated in red. Table 2. Correlation coefficients between principal factors and initial threats
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
From Table 2 it is seen that the first factor to greater extent correlates with threats than the second and third factors. It should be expected, since, as it has been mentioned above, factors are defined sequentially and contain less and less total variance. Interpretation of factor structureIt is convenient to carry out interpretation of factors (principal components) by using a diagram where threats are shown as vectors the coordinates of which correspond to factor loadings (Figure 2). In accordance with maximum factor loadings threats may be divided into three categories (red, blue and green coulours). The first group of threats includes: FB, CP, SF, GD, NA, CM, GW. As seen in Figure 2 these threats are in the plane of the first and second factors. It means that for more detail analysis it is advisable to show them in the projection on this plane (Figure3). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As seen from Figure 3 the pairs of vectors SF-GD, FB-GW are practically colinear, which indicates their high degree of dependence. It is interesting that we study only two factors, then the pair of vectors CP-GINI may be considered as colinear. It should be also noted that the vector ES is orthogonial to FB (GW). It means that:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The most significant global threats are defined by using factor loadings of the initial list of threats. For this it is necessary to select such factors which have maximum loading by absolute value on the first, second and third factors. This choice ensured the definition of maximum impact of initial threats under condition of their maximum independence on the aggregated indicator (Minkovsky norm) of these threats (Figure 4). In accordance with the indicated approach such threats are SF, ES, GINI (Figure 4), i.e. the most significant threats in descending order are state fragility, global decrease of energy security and growing inequality between people and countries. Clustering of countries by the level of global threats and the corresponding graphic interpretation is done in the plane of the first and second factors. For this purpose we cluster countries by the degree of their remoteness from threats (Minkovsky norm) using the clustering method of K-means. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As seen from Figure 5 the isolines which assign the Minlovsky norm approximation are practically orthogonal to the first factor axis. It gives the ground to state that the first factor values mostly determine the countries’ remoteness from global threats. 3. Researching the dependence of countries’ national security on particular threats by using modified method of weighted local correlationLet us consider that the quantitative value of Minkovsky norm for this or that country is an estimate of its national security level. We define the level of Minkovsky norm dependence on initial threats by calculating the corresponding correlation coefficients (Table 3): Table 3. Correlation coefficients between Minkovsky norm and global threats
The calculated correlation coefficients show a high degree of dependence of Minkovsky norm on initial threats, but at the same time do not answer the question what risks the countries are running from the point of view of their approaching various threats. The reason is the averaging of correlation coefficients on the entire data sample. For detailed analysis of global threats the countries may face, it is necessary to localize the sample on which correlation is estimated. It is natural to assume that this sample should include “alike” countries the degree of similarity of which may be estimated as, for example, a Euclidean distance in the space of threats. The second assumption is connected with the idea that the closer is a country to the point in which the correlation is analyzed; the higher is the degree of the country’s indicators impact on the correlation coefficient. In accordance with the above assumptions we define the weighted mean (A MATLAB Toolbox for computing Weighted Correlation Coefficients, 2008) as:
where If we define
Similarly, we can define the weighted localized covariation:
And we define the weighted localized correlation (WLC):
The distribution parameter of weights
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
With distribution scale equal 1, WLC coincides with Pearson product-moment correlation coefficient. As seen from (10), the weights distribution parameter is calculated for each point The interpretation of WLC values is presented in Table 4. Table 4. Interpretation of values of weighted localized correlation (WLC)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Figures 8-10 present the plotted values of weighted localized correlation (WLC) between Minkovsky norm and most significant threats, respectively: SF, ES and GINI. As seen from Figure 8 the level of state fragility (SF) for most countries has considerable impact on their national security. As to the impact of energy security on the level of national security (Figure 9), the following groups of countries may be identified:
As to the impact of population inequality on national security (Figure 10) it is possible to identify a group of countries (Canada, Sweden, Norway, Australia, Finland, New Zealand, Denmark, Switzerland, Netherlands, Austria, Luxembourg, Japan, Ireland, France, Germany, Portugal, Slovenia, Belgium), for which a mean positive correlation between this threat and Minkovsky norm is observed. For the rest of countries this correlation is insignificant. 4. Conclusions1. Since it is very complicated to analyze security of this or that country simultaneously in the space of ten global threats the principal component analysis (PCA) was used. This method allowed reducing ten global threats influencing the general level of national security (in the sense of Minkovsky norm) to three hidden factors determining this characteristic. The application of this approach allowed considerably facilitate research of national security, reducing it to the analysis in the space of three determining factors. 2. By using this method a comprehensive study of national security of different countries was carried out in the space of three determining factors. Factor loadings were defined by calculating coefficients of correlation between principal factors and initial threats. Clustering of countries was made according to the level of global threats, and three most significant threats were defined influencing national security of most countries: state fragility (SF), energy security (ES) and people’s inequality (Gini). Graphic interpretation of global threats was done in the space of three principal components. The factor structure of threats was studied, and the degrees of dependence between main groups were defined. 3. The method of weighted localized correlation was modified, which allowed carry out research of the dependence of national security level (Minkovsky norm) on particular global threats. By using this method the dependence between Minkovsky norm and most significant threats were analyzed in detail, in particular, state fragility (SF), energy security (ES) and people’s inequality (Gini). Recommendations were made for different countries regarding strengthening their national security. References
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
