Fix files containing \0 characters or having incorrect file formats
Are you seeing issues consuming files on linux coming from different systems generated on windows environment?
Lets take a stab,
Step 1
Check what is going on in the file
$ od -c bad_file.txt
Step 2
Are you seeing line separator as \r\n ?
Lets fix this using dos2unix command
$ dos2unix bad_file.txt
Caution: This will replace the original file
Step 3
Non standard format Like \r \0 \n ?
Lets remove all the \0 or NULLs from the input file
$ tr < bad_file.txt -d '\000' > good_file.txt
Goto step 2 if required
awk fan? use this: $ awk '{gsub(/\0/,"",$0); print $0}' bad_file.txt > good_file.txt
Genesis
Hive wasn’t returning results in expected format due to incorrect file encoding coming from SAS on Windows environment.