Raster Local Moran P Value



Such models exist in a general regression framework (e.g. Generalized additive models), where “local” refers to the values of the predictor values. In a spatial context local refers to location. Rather than fitting a single regression model, it is possible to fit several models, one for each location (out of possibly very many) locations.

  1. Raster Local Moran P Value Index
  2. Raster Local Moran P Value Proposition
  3. Raster Local Moran P Value Formula
  4. Raster Local Moran P Value Calculator
  5. Raster Local Moran P Value Fund

This tool creates a new Output Feature Class with the following attributes for each feature in the Input Feature Class: Local Moran's I index (LMiIndex), z-score (LMiZScore), p-value (LMiPValue), and cluster/outlier type (COType). The field names of these attributes are also derived tool output values for potential use in custom models and scripts. Jun 27, 2019 The count of cell with value 1 in the INTERSECTION raster is 22,822, while in the UNION raster is 37,716. I have computed Moran's I with ape using. I want to test the correlation in the values between 2 spatial raster data sets (that perfectly overlap).

Given a set of features (Input Feature Class) and an analysis field (Input Field), the Cluster and Outlier Analysis tool identifies spatial clusters of features with high or low values. The tool also identifies spatial outliers. To do this, the tool calculates a local Moran's I value, a z-score, a pseudo p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and pseudo p-values represent the statistical significance of the computed index values.

  1. A pseudo p-value is then calculated by determining the proportion of Local Moran's I values generated from permutations that display more clustering than your original data. If this proportion (the pseudo p-value) is small (less than 0.05), you can conclude that your data does display statistically significant clustering.
  2. GeoRaster (raster, geot, nodatavalue=nan, fillvalue=-0.0, projection=None, datatype=None) source ¶ GeoRaster class to create and handle GIS rasters. Eash GeoRaster object is a numpy masked array + geotransfrom + nodatavalue.

Calculations

View additional mathematics for the local Moran's I statistic.

Interpretation

A positive value for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information on determining statistical significance, see What is a z-score? What is a p-value? Note that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value. The z-scores and p-values reported in the output feature class are uncorrected for multiple testing or spatial dependency.

The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. When no FDR correction is applied, features with p-values smaller than 0.05 are considered statistically significant. The FDR correction reduces this p-value threshold from 0.05 to a value that better reflects the 95 percent confidence level given multiple testing.

Output

This tool creates a new output feature class with the following attributes for each feature in the input feature class: local Moran's I index, z-score, p-value, and COType.

When this tool runs in ArcMap, the output feature class is automatically added to the table of contents (TOC) with default rendering applied to the COType field. The rendering applied is defined by a layer file in <ArcGIS>/ArcToolbox/Templates/Layers. You can reapply the default rendering, if needed, by importing the template layer symbology.

Permutations

Permutations are used to determine how likely it would be to find the actual spatial distribution of the values that you are analyzing by comparing your values to a set of randomly generated values. Even with complete spatial randomness (CSR), some degree of clustering will always be observed simply due to randomness. Permutations will generate many random datasets and compare these values to the Local Moran's I of your original data. To do this, each permutation randomly rearranges the neighborhood values around each feature and calculates the Local Moran's I value of this random data. By looking at the distribution of the Local Moran's I generated from permutations, you can see the range of Local Moran's I values that could reasonably be due to randomness. If there is a statistically significant spatial pattern in your data, you expect the Local Moran's I values generated from permutations to display less clustering than the Local Moran's I value from your original data. A pseudo p-value is then calculated by determining the proportion of Local Moran's I values generated from permutations that display more clustering than your original data. If this proportion (the pseudo p-value) is small (less than 0.05), you can conclude that your data does display statistically significant clustering.

Choosing the number of permutations is a balance between precision and increased processing time. Increasing the number of permutations increases precision by increasing the range of possible values for the pseudo-p. For example, with 99 permutations, the precision of the pseudo-p value is .01 (1/99+1), and for 999 permutations, the precision is .001 (1/999+1). A lower number of permutations can be used when first exploring a problem, but it is best practice to increase the permutations to the highest number feasible for final results.

Best practice guidelines

  • Results are only reliable if the input feature class contains at least 30 features.
  • This tool requires an input field such as a count, rate, or other numeric measurement. If you are analyzing point data, where each point represents a single event or incident, you might not have a specific numeric attribute to evaluate (a severity ranking, count, or other measurement). If you are interested in finding locations with many incidents (hot spots) and/or locations with very few incidents (cold spots), you will need to aggregate your incident data prior to analysis. The Hot Spot Analysis (Getis-Ord Gi*) tool is also effective for finding hot and cold spots. Only the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, however, will identify statistically significant spatial outliers (a high value surrounded by low values or a low value surrounded by high values).
  • Select an appropriate conceptualization of spatial relationships.
  • When you select the SPACE_TIME_WINDOW conceptualization, you can identify space-time clusters and outliers. See Space-Time Analysis for more information.
  • Select an appropriate distance band or threshold distance.
    • All features should have at least one neighbor.
    • No feature should have all other features as a neighbor.
    • Especially if the values for the input field are skewed, each feature should have about eight neighbors.

Potential applications

The Cluster and Outlier Analysis (Anselin Local Moran's I) tool identifies concentrations of high values, concentrations of low values, and spatial outliers. It can help you answer questions such as these:

  • Where are the sharpest boundaries between affluence and poverty in a study area?
  • Are there locations in a study area with anomalous spending patterns?
  • Where are the unexpectedly high rates of diabetes across the study area?

Applications can be found in many fields including economics, resource management, biogeography, political geography, and demographics.

Additional resources

Anselin, Luc. 'Local Indicators of Spatial Association—LISA,' Geographical Analysis 27(2): 93–115, 1995.

Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2.ESRI Press, 2005.

Regression models are typically “global”. That is, all date are used simultaneously to fit a single model. In some cases it can make sense to fit more flexible “local” models. Such models exist in a general regression framework (e.g. generalized additive models), where “local” refers to the values of the predictor values. In a spatial context local refers to location. Rather than fitting a single regression model, it is possible to fit several models, one for each location (out of possibly very many) locations. This technique is sometimes called “geographically weighted regression” (GWR). GWR is a data exploration technique that allows to understand changes in importance of different variables over space (which may indicate that the model used is misspecified and can be improved).

There are two examples here. One short example with California precipitation data, and than a more elaborate example with house price data.

California precipitation¶

Compute annual average precipitation

Raster Local Moran P Value Index

Global regression model

Create Spatial* objects with a planar crs.

Get the optimal bandwidth

Create a regular set of points to estimate parameters for.

Run the gwr function

Link the results back to the raster

California House Price Data¶

We will use house prices data from the 1990 census, taken from “Pace, R.K. and R. Barry, 1997. Sparse Spatial Autoregressions. Statistics and Probability Letters 33: 291-297.” You can download the data here

Each record represents a census “blockgroup”. The longitude and latitude of the centroids of each block group are available. We can use that to make a map and we can also use these to link the data to other spatial data. For example to get county-membership of each block group. To do that, let’s first turn this into a SpatialPointsDataFrame to find out to which county each point belongs.

Now get the county boundaries and assign CRS of the houses data matches that of the counties (because they are both in longitude/latitude!).

Do a spatial query (points in polygon)

Summarize¶

We can summarize the data by county. First combine the extracted county data with the original data.

Teamviewer host download 12newfamous. TeamViewer Host. TeamViewer Host is used for 24/7 access to remote computers, which makes it an ideal solution for uses such as remote monitoring, server maintenance, or connecting to a PC or Mac in the office or at home. Install TeamViewer Host on an unlimited number of computers and devices. As a licensed user, you have access to them all!

Compute the population by county

Income is harder because we have the median household income by blockgroup. But it can be approximated by first computing total income by blockgroup, summing that, and dividing that by the total number of households.

Value

Regression¶

Before we make a regression model, let’s first add some new variables that we might use, and then see if we can build a regression model with house price as dependent variable. The authors of the paper used a lot of log tranforms, so you can also try that.

Ordinary least squares regression:

Geographicaly Weighted Regression¶

By county¶

Of course we could make the model more complex, with e.g. squared income, and interactions. But let’s see if we can do Geographically Weighted regression. One approach could be to use counties.

First I remove records that were outside the county boundaries

Then I write a function to get what I want from the regression (the coefficients in this case)

And now run this for all counties using sapply:

Plot of a single coefficient

There clearly is variation in the coefficient ((beta)) for income. How does this look on a map?

First make a data.frame of the results

Fix the counties object. There are too many counties because of the presence of islands. I first aggregate (‘dissolve’ in GIS-speak’) the counties such that a single county becomes a single (multi-)polygon.

Now we can merge this SpatialPolygonsDataFrame with data.frame with the regression results.

Raster Local Moran P Value Proposition

To show all parameters in a ‘conditioning plot’, we need to first scale the values to get similar ranges.

Is this just random noise, or is there spatial autocorrelation?

By grid cell¶

An alternative approach would be to compute a model for grid cells. Let’s use the ‘Teale Albers’ projection (often used when mapping the entire state of California).

Create a RasteLayer using the extent of the counties, and setting an arbitrary resolution of 50 by 50 km cells

Get the xy coordinates for each raster cell:

Raster Local Moran P Value Formula

For each cell, we need to select a number of observations, let’s say within 50 km of the center of each cell (thus the data that are used in different cells overlap). And let’s require at least 50 observations to do a regression.

First transform the houses data to Teale-Albers

Set up a new regression function.

Run the model for al cells if there are at least 50 observations within a radius of 50 km.

For each cell get the income coefficient:

Use these values in a RasterLayer

So that was a lot of ‘home-brew-GWR’.

Question 1: Can you comment on weaknesses (and perhaps strengths) of the approaches I have shown?

spgwr package¶

Now use the spgwr package (and the the gwr function) to fit the model. You can do this with all data, as long as you supply and argument fit.points (to avoid estimating a model for each observation point. You can use a raster similar to the one I used above (perhaps disaggregate with a factor 2 first).

This is how you can get the points to use:

Create a RasterLayer with the correct extent

Raster Local Moran P Value Calculator

Set to a desired resolution. I choose 25 km

I only want cells inside of CA, so I add some more steps.

Extract the coordinates that are not NA.

I don’t want the third column

Now specify the model

gwr returns a list-like object that includes (as first element) a SpatialPointsDataFrame that has the model coefficients. Plot these using spplot, and after that, transfer them to a RasterBrick object.

To extract the SpatialPointsDataFrame:

Raster Local Moran P Value Fund

To reconnect these values to the raster structure (etc.)

Question 2: Briefly comment on the results and the differences (if any) with the two home-brew examples.