# Exploring spatial patterns

GIS programs offer tools that help researchers bring forth, explore, and analyse spatial patterns in geographical datasets. In ArcGIS, such tools can be categorised as global and local.

Global autocorrelation tools measure the extent to which points that are “close together” in space have similar values, on average. On the other hand, the local autocorrelation tools analyze the extent to which points that are “close” to a given point have similar values. In ArcGIS, global tools generate reports, whereas local tools produce new maps.

ArcGIS in particular uses private nomenclature for the following statistical methods:

Method | ArcGIS tool | Type |
---|---|---|

Moran’s I (local) | Cluster and outlier analysis | Local |

GI* | Hot spot analysis | Local |

Moran’s I (global) | Global autocorrelation | Global |

G-statistic | High/low clustering | Global |

Gross domestic product (GDP) is defined as an aggregate measure of production equal to the sum of the gross values added of all resident, institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs). Essentially, it provides a measurement for the economic performance of a country. In this exercise, GDP values per capita were used so that they are not affected by the population of a country.

In this project, we’ll attempt to explore spatial patterns in global GDP distribution for the year 2010. GDP data was acquired from the World Bank website (data.worldbank.com), a United Nations international financial institution that provides loans to developing countries for capital programs, at the same time providing an extensive library of development-related datasets.

The steps followed can be listed as follows:

- Acquisition of data
- Conversion of the projection using the tool “Project”, which will convert it to Mercator projection
- Exploratory analysis (ESDA) using the two ‘global’ tools: Spatial Autocorrelation and High/Low Clustering, with generation of two distinct reports
- Local analysis (LISA) using the following tools: Cluster and Outlier Analysis and Hot Spot Analysis Export maps and creation of a short presentation

## Technical insight

### Comparison of global statistics

The global statistics for spatial autocorrelation analysis in ArcGIS use the following null hypothesis: that the values being analyzed do not exhibit spatial pattern, i.e., the spatial pattern is random. A non statistically significant z-score returned by any tool, for example between -1,96 and 1,96 for a confidence level of 95%, means that the null hypothesis cannot be rejected, thus concluding that the spatial distribution of the input values could have been generated by some underlying random process.

The two global statistics return an index number and a Z score; their difference lies on the interpretation of these results. For the Global Moran’s I, a statistically significant positive z-score means that similar values cluster spatially: high values are found closer together and low values are found closer together, something that would not be expected from an underlying random spatial process. A statistically significant negative z-score means that similar values are spatially dispersed: high values are found far away from other high values and low values are found far away from other low values, and this dispersion is more pronounced than we would expect from an underlying random spatial process. Dispersion with geographic data is less common than clustering, but might be seen with some kind of competitive or territorial spatial process, where similar features try to be as far away from each other as possible.

As far as the general **G-statistic** is concerned, a statistically significant positive z-score means high/larger values cluster spatially. Larger values are found closer together than we would expect if the underlying spatial process were random. A statistically significant negative z score means that low/smaller values cluster spatially. Smaller values are found closer together than we would expect if the underlying spatial process were random.

The G-statistic could be more useful when it is known that clustering is either going to involve high values or low values (but not both). An example of proper use would be the monitoring of a process that tends to be random and the user wishes to identify a sudden spatial spike of high values.

### Comparison of local statistics

Global statistics answer the question whether there is a spatial pattern or not. On the other hand, local statistics show where the spatial pattern is, thus generate a map. The Local GI* generates a map with statistically significant Hot Spots and statistically significant Cold Spots, while the Local Moran’s I shows where values cluster spatially and where values are very different from neighbors (outliers).

### Optimized Hot Spot Analysis

Optimized Hot Spot Analysis executes the Hot Spot Analysis (Getis-Ord Gi*) tool using parameters derived from characteristics of the input data by interrogating the data to obtain the settings that will yield optimal hot spot results. Three parameters are optimized:

- Initial data assessment
- Incident aggregation (aggregation of a number of incidents for optimization)
- Scale of analysis (using Incremental Spatial Autocorrelation, followed by internal operations, if necessary)

Detailed description of each of the parameters and their settings can be found at ArcGIS’s resource library: How Optimized Hot Spot Analysis Works.

## Results

### Global statistics: Global Moran’s I Summary

Moran’s Index | 0,390110 |
---|---|

Expected Index | -0,008403 |

Variance | 0,001228 |

z-score | 11,372282 |

p-value | 0 |

### Global statistics: General G Summary

Observed General G | 0,013601 |
---|---|

Expected General G | 0,008403 |

Variance | 0 |

z-score | 9,755528 |

p-score | 0 |