algorithmgeolocationgisgeospatialzipcode

Algorithm to get US zip codes from gis x,y coordinates


I have a database of many tens of thousands of events that occurred at specific geographic locations within the United States. The data include x,y coordinates for each event, encoded using the NAD83 reference system. I want to write or use an algorithm to reliably get the US zip code associated with each NAD83 x,y coordinate.

I do not yet have zip code definitions using the NAD83 reference system. And I have never done this kind of programming before. But it just seems like it would be intuitively simple to find out whether a given x,y coordinate is located within a geometric shape of a US zip code defined using the same NAD83 reference system.

I am interested in the following:

  1. Where do I get reliable US Zip Code definitions in the NAD83 reference system format?
  2. Where can I find example code for an algorithm to find the zip code given an x,y coordinate?

I am interested in any links to instructional articles/tutorials, example code, and NAD83 zip code boundary definition data. I am doing web searches, but I figured that people on this site might be able to give me more of an expert's guide.

If the code you provide is not written in Java, I could take code written in another language and adapt it to Java for my purposes. I do not have database software installed in my computer because I just use csv or text files as inputs into my Java applications. If you have some database that you suggest I use, I would need links to instructions for how to get the data into a format that I can import into a programming language such as Java.

Finally, the street addresses in my dataset do not include zip codes, and the street addresses are written haphazardly, so that it would be very difficult to try to clean the address data up enough to try to get zip codes from the addresses. I can isolate the data to several adjacent cities, in perhaps a couple hundred zip codes, but I think that the NAD83 x,y coordinates are my best shot at deriving the zip code in which each event in my dataset occurred. I want to link my resulting zip code by zip code analyses with other data that I get about each zip code from sources like the US Census, etc.


Solution

  • i don't know where to get the ZIP code, but i think you can google it out, the ZIP code of each state.

    and to question (2), first you'll need the geographic information, i.e. the boundary of each state. then you just enumerate all the points(x,y) and determine which polygon it's in.

    Here is a sample code, it was written for SGU124.

    #include <map>
    #include <cstdio>
    #include <cstring>
    #include <algorithm>
    
    #define MAXN 10005
    
    using namespace std;
    
    struct pnt{
        int x,y;
    };
    struct seg{
        pnt a,b;
    }   s[MAXN];
    int n;
    pnt p;
    int h[MAXN<<1];
    int k[MAXN<<1];
    
    void work(){
        int i,x,y,c = 0;
        memset(h,0,sizeof(h));
        memset(k,0,sizeof(k));
        for (i=0;i<n;i++){
            if (s[i].a.x<=p.x && p.x<=s[i].b.x && s[i].a.y<=p.y && p.y<=s[i].b.y){
                printf("BORDER\n");
                return;
            }
            if (s[i].a.x==s[i].b.x){
                x = s[i].a.x;
                y = p.y - p.x + x;
                if (x<=p.x && s[i].a.y<=y && y<=s[i].b.y){
                    h[x+MAXN] = 1;
                    if (y==s[i].a.y) k[x+MAXN] |= 1;
                        else if (y==s[i].b.y) k[x+MAXN] |= 2;
                }
            }
            else{
                y = s[i].a.y;
                x = p.x - p.y + y;
                if (x<=p.x && s[i].a.x<=x && x<=s[i].b.x){
                    //printf("%d %d %d %d\n",s[i].a.x,s[i].a.y,s[i].b.x,s[i].b.y);
                    h[x+MAXN] = 1;
                    if (x==s[i].a.x) k[x+MAXN] |= 4;
                        else if (x==s[i].b.x) k[x+MAXN] |= 8;
                }
            }
        }
        for (i=p.x;i>=-10000;i--){
            //if (h[i+MAXN]>0) printf("@ %d %d\n",i,k[i+MAXN]);
            if (k[i+MAXN]!=9 && k[i+MAXN]!=6) c += h[i+MAXN];
        }
        //printf("p @ %d %d ",p.x,p.y);
        if (c%2) printf("INSIDE\n");
            else printf("OUTSIDE\n");
    }
    
    int main(){
        freopen("sgu124.in","r",stdin);
        int i;
        while (~scanf("%d",&n)){
            for (i=0;i<n;i++){
                scanf("%d%d",&s[i].a.x,&s[i].a.y);
                scanf("%d%d",&s[i].b.x,&s[i].b.y);
                if (s[i].a.x>s[i].b.x || s[i].a.y>s[i].b.y) swap(s[i].a,s[i].b);
            }
            scanf("%d%d",&p.x,&p.y);
            work();
            //break;
        }
        return 0;
    }