Monday, March 12, 2012

Changing encoding on a dbf file/column

I know, not very exciting, but I thought I'd capture this before I forget.  Having some character encoding issues with a particular dbf file.  Apparently, it was encoded as ISO-8859-15 but everything seems to try to read it as UTF-8...so with Nick's help, came up with a quick way to convert DBF columns from one encoding to another.


How to convert a dbf file from one encoding to another. In this example, only convert the first column (NAME) from ISO-8859-15 to UTF-8 while keeping all other columns the same.

Step 1: Convert DBF file to csv

    ogr2ogr -F "CSV" NEWFILE.csv OLDFILE.dbf

Step 2: run the attached perl script convert.pl sending the new csv file in as stdin:

    ./convert.pl < NEWFILE.csv > NEWFILECONVERT.csv  


Step 3: Convert the newly encoded csv file back to your DBF file

    ogr2ogr -F "ESRI Shapefile" NEWFILECONVERT.csv OLDFILE.dbf


====== convert.pl ==========
#!/usr/bin/perl

while (<>)
{

    chomp; @_ = split /,/;
    my $name = $_[0];
    if ($name =~ /\w/)
   {
        $_[0] = `echo -n "$name" | /usr/bin/iconv -f ISO-8859-1 -t UTF-8`;
    }

    print join(',', @_) ."\n";
}

====== convert.pl ==========

No comments: