Sunday, January 6, 2008

Scanned page to PDF converter

OK, so here's a first code snippet that I found useful. A scan utility that I used gave me a bunch of A4 images in folders, one image per page. I wanted to convert these scanned pages into PDF documents. The excellent reportlab package allows the creation of PDFs from Python; the rest was just globbing and glueing.

The script assumes that each subfolder in the current (or specified) path contains a collection of images that should be converted into a single PDF. It also assumes that the images are named in such a way that sorting by filename will place the pages in the correct order. For each subfolder, a similarly named PDF is created in the main folder.
#!/usr/bin/python

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
import os, sys, glob

width, height = A4

if len(sys.argv) > 1:
try:
os.chdir(sys.argv[1]) # Change working directory
except:
print "Usage: pics2pdf [directory]"
exit()

# Iterate over all the subdirectories of the current directory:
for subdir in filter(os.path.isdir, os.listdir('.')):

pdfname = subdir+".pdf"
ignorelist = []
print "Processing", pdfname, "[",
sys.stdout.flush()
c = canvas.Canvas(pdfname)

# Iterate over all the possible pictures in the subdirectory:
PicList = glob.glob(os.path.join(subdir,"*"))
PicList.sort()

for pic in PicList:

try: # Assume the file is a valid image
c.drawImage(pic, 0, 0, width, height)
c.showPage()
print os.path.basename(pic),
sys.stdout.flush()
except: # Didn't seem to be an image
ignorelist.append(os.path.basename(pic))

print "]"

if len(ignorelist) > 0:
print "Ignored non-image file(s)",
for file in ignorelist:
print file,
print ""

c.save()

init 5

Well, at last getting round to it! I've been playing around with the idea of starting a technical blog some time, a place to post random snippets of code or thoughts on software, hardware and engineering. It sometimes feels like such a waste to expend mental energy on, for example, putting together a quick script to do something useful, only to use it once or twice and then shelve it into some dusty folder. Rather put it out in the wild; maybe somebody finds it useful, or maybe someone can give useful suggestions on it.

So let's see how it goes, and whether I have the discipline to update this regularly :)