Character Encoding
Character encoding systems consist of a code that pairs each character from a character set with something else, such as a sequence of numbers or octets, to facilitate data stransmission and storage. HAWQ supports a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended UNIX Code), UTF-8, and Mule internal code. Clients can use all supported character sets transparently, but a few are not supported for use within the server as a server-side encoding.
Data files must be in a character encoding recognized by HAWQ. Data files that contain invalid or unsupported encoding sequences encounter errors when loading into HAWQ.
Note: On data files generated on a Microsoft Windows operating system, run the dos2unix
system command to remove any Windows-only characters before loading into HAWQ.