Spatial statistics of lightning strikes 2

Continuing from part one, after looking at where lightning strikes occurred over Alaska in June and July of 2012, the next step in the EDA is to perform some basic interpolation to get a more continuous sense of the likelihood of lightning activity across space for each of these periods using filled contour lines. I use 2D kernel density estimation to produce the maps. Note that I do not actually use the R function filled.contour to make the plots but the idea and the result are the same.

Before we jump below, it is worth mentioning that it is important not to get caught up in the aesthetic of the graphics. What you see is what you get. But we don’t see or get as much as we think we do. Basic interpolation is not statistical modeling. I wouldn’t call it completely nonparametric (but then, what is?) because it does involve kernel and bandwidth parameters. However, “tuning” it to a “good fit” is generally done by eye to avoid obvious underfitting or overfitting, or even more commonly, simply by rule of thumb defaults which tend to work fine for many datasets. There is no traditional model selection process, no information theoretic, no statistical tests. As part of the EDA, I classify the KDE and resulting plots below as belonging to descriptive statistics. True, we are imputing things that are not there, but this should not be confused with rigorous inferential statistics. We are primarily describing the data in front of us and generating useful, meaningful visualizations based off of our method of describing that data.

Kernel density estimation considers only the locations of our observations of random variables. In this case, there is no attention to data at these lighting strike coordinates, such as strike multiplicity, strike strength, temperature, humidity, elevation, etc. At this stage of the EDA, with KDE we are merely interpolating and mapping spatial intensity/density of events. Covariates would be considered later as part of a more rigorous analysis. We can think of the plots below and the method by which they were created as a basic exploratory analog to geostatistics. Later when considering how spatial clustering of lightning strike activity varies through time, it could be considered more akin to a point process.

Now let’s take a look at how to do kernel density estimation using the MASS package in R. All I show here is the density estimation and plotting code. In part three [available soon!] I continue the exploratory data analysis by beginning to look at two covariates: the lightning dataset attributes, strike multiplicity and strike strength.

library(MASS)
library(raster)
library(rasterVis)
 
setwd("C:/tmp/Lightning/tifs")
plotDir <- "../graphics" # output folder
r1 <- raster("C:/tmp/raster_layers/Alaska_main.tif")) # template raster for plot background
 
# list files, create vectors of months and years
files <- list.files()
mos <- substr(files,nchar(files)-5,nchar(files)-4)
yrs <- substr(files,nchar(files)-10,nchar(files)-7)

The code above is just like in the previous post. Similarly, with the code below, I iterated over the files/months of years in a j loop as before. Here is a code example for plotting the lightning strikes for each j. I set up a different theme first.

# rasterVis settings
rtheme1 <- streamTheme(region=BTC(n=9), panel.background=list(col='gray20'), pch=20,col="orange", cex=1.5)
colkey.at <- seq(0, 1, length.out=length(rtheme1$regions$col)-1)
colkey.col <- rtheme1$regions$col
 
for(j in 1:length(files)){
 
# the TIF files have two variable layers. Obtain coordinates of non-NA cells. Same for either layer.
b <- brick(files[j])
cells <- Which(!is.na(raster(b,1)), cells=T)
coords <- xyFromCell(b, cells)
kde <- kde2d(coords[,"x"], coords[,"y"], n=c(1374,1450), lims=c(xmin(b), xmax(b), ymin(b), ymax(b)))
kde$z <- (kde$z - min(kde$z))/(max(kde$z - min(kde$z))) # map kernel density estimates to [0,1] relative probability estimates
b2 <- brick(b, values=F)
b2 <- setValues(b2, kde$z, layer=1)
r2 <- flip(raster(b2, 1), 2)
r2 <- mask(r2, r)

#Create a PNG for month j
png(file.path(plotDir, paste0("lightning_probability_", yrs[j], "_", mos[j], ".png")), width=2400, height=2400, res=300)
l <- levelplot(r2,
    par.settings=modifyList(rtheme1, list(layout.heights=list(top.padding=0, bottom.padding=-1))),
    cuts=19, maxpixels=2e6, margin=F, contour=T,
    main=list(paste(month.abb[as.numeric(mos[j])], yrs[j], "Lightning strike probability")),
    scales=list(draw=FALSE), colorkey=list(space="right", at=colkey.at, col=colkey.col),
    panel=function(...){
        panel.levelplot(...)
        sp.points(coords, col='orange',pch=".",cex=1)
        panel.key("Observed strike", corner = c(1,.025), col="orange")
    }
)
print(l)
dev.off()
 
} # end j loop

Looping over the plotting code like this would generate a [0,1] scale “lightning probability” map PNG for each input TIF or month-year pair. Below are example plots for June and July of 2012.

lightning_probability_2012_06_800x800

lightning_probability_2012_07_800x800

This entry was posted by Matt Leonawicz.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: