sqldump

(coffee) => code

Those @!#$ Weird Characters in Hadoop / Faunus Output

Text-file output from Faunus often always contains garbage characters. To scrub them out, I use this little python script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import re
from string import printable

f = open("output.csv", "r")
line = f.readline()
line = re.sub("[^{}]+".format(printable), "", line)
line = line.replace("\n", "")

while line:
  print line
  line = f.readline()
  if line:
    line = re.sub("[^{}]+".format(printable), "", line)
    line = line.replace("\n", "")

f.close()

And then a simple

1
python process.py > scrubbed.output.txt