Thursday, April 6, 2017

Data Normalization, Geocoding, and Error Assessment: Sand Mining Suitability Project

Goals and Objectives
The purpose of this lab is to be introduced to geocoding and the challenges that come with it.  In this lab 19 sand mines in western Wisconsin were geocoded then compared to the actual location and classmates locations.  This lab was designed to make the user explore different tools that ArcMap has to offer, and to make the user creatively think how to solve a problem.

Methods
The mine location data file came from the Wisconsin DNR.  It contained a variety of information including facility name, contact address, address, PLSS location, facility type, etc. This information was not put together in a useful manner.  The address field contained the PLSS location as well as the actual location. The first step was to create a new table with just the mines that were to be geocoded and to normalize that table. The original table contained 129 mines, 19 were assigned to each student to geocode.  The 19 mines were put into a new Excel table and new categories were created to normalize the data.  PLSS location, street, city, and state categories were made and the actual facility address was put into these new categories. In some cases the mines did not have an address only a PLSS location. Once the table was normalized it was brought into arcmap and used to geocode the locations.  The address locator tools was used from the geocoding toolbar to find the location of the mines. The fields that were added were used as the fields to locate the mines by.
The initial matching report revealed that 100% of the mines were matched. This was not true.  A lot of the points were not accurate and needed to be found manually and matched by hand, this was done using the interactive rematch tool.  
This was done for all 19 mines.  For four of the mines no address was included.  Two of them had PLSS locations that were used to find them.  PLSS location quadrants were imported and used to locate these mines.  For the other two a google search was used.  One mine only had an intersection as its location.  For this one google maps was used to locate it. At the end all 19 mines were located and a  point feature class was created of the results. 

The next step was to compare my mines to the actual location, and the location of classmates who had the same mines.  The first step was to use a query to get the same mines from other classmates data and the actual location data set.  
This query selected only specific mines that were to be compared.  Once the mines were selected a new feature class was created for each of classmates and the actual locations data set.  The next step was to compare my points to the others.  To do this the near tool was used. This tool finds the nearest point to one of mine and gives the distance to that point in meters. To compare my points to classmates, the classmates data was the input feature and my points were the near features. This was done because they didn't have all 19 of the same points and the near tool would have just found a random point to fill in for that mine. For comparing my locations to the actual locations, the actual locations was the input features class and my locations was the near features class.
  This tool creates a new column in the input table with the distance to the nearest feature. The four tables were then exported as a text file and brought into Microsoft Excel.  The average distance to mines was calculated by using the average equation in Excel.

Results

The data from the DNR came in a format that was not ready to be used.  Four new fields were created to normalize the data.
Not Normalized
Normalized
Originally the address field was a mess.  The table was normalized to get the address in a more organized fashion.  This allowed the geocoding process to go quicker by finding the address more accurately.

After all the mine locations for the classmates were sorted out a map was created of all the different locations mines were geocoded at. 

The map below shows one mine and the four locations that were geocoded for that mine.  This map shows how some of the variance occurs.  One person put the location at the mine entrance on the road, down the road, in the mine, and behind the mine.  This variance results in the distance between locations being different for each person and for each mine. 
The table below shows the distance from the actual mine location that the DNR provided and where my mine locations were.  The average distance was 1027 meters.  This high number is the result of a few mines being very far off.  

The error that occurred was positional error .  This is the result of the points not being accurately placed at the mine location.  This has to do with different understandings of how to locate a place and how addresses work.  An address is read from the end of the driveway and not the actual spot of the house or building on a property.  The most accurate place to put a address location would be where the driveway meets the road.  This is not where the DNR placed the points and that is why there is fluctuation from the actual points and the authors point locations. 

Conclusion:
Geocoding can be very accurate and points can be placed where the user deems them most appropriate.  This freedom to place points where they please also leads to the resulting locations being different.  This must be kept in mind whenever geocoded locations are being used.  They might not be in the right spot for the application the user wishes to use them for.  

No comments:

Post a Comment