Fix files containing \0 characters or having incorrect file formats

Are you seeing issues consuming files on linux coming from different systems generated on windows environment?

Lets take a stab,

Step 1

Check what is going on in the file

$ od -c bad_file.txt

Step 2

Are you seeing line separator as  \r\n ?

Lets fix this using dos2unix command

$ dos2unix bad_file.txt

Caution: This will replace the original file

Step 3

Non standard format Like \r \0 \n ?

Lets remove all the \0 or NULLs from the input file

$ tr < bad_file.txt -d '\000' > good_file.txt

Goto step 2 if required

awk fan? use this: $ awk '{gsub(/\0/,"",$0); print $0}' bad_file.txt > good_file.txt

 

Genesis

Hive wasn’t returning results in expected format due to incorrect file encoding coming from SAS on Windows environment.

 

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *