Sunday, January 6, 2008

Scanned page to PDF converter

OK, so here's a first code snippet that I found useful. A scan utility that I used gave me a bunch of A4 images in folders, one image per page. I wanted to convert these scanned pages into PDF documents. The excellent reportlab package allows the creation of PDFs from Python; the rest was just globbing and glueing.

The script assumes that each subfolder in the current (or specified) path contains a collection of images that should be converted into a single PDF. It also assumes that the images are named in such a way that sorting by filename will place the pages in the correct order. For each subfolder, a similarly named PDF is created in the main folder.

from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
import os, sys, glob

width, height = A4

if len(sys.argv) > 1:
os.chdir(sys.argv[1]) # Change working directory
print "Usage: pics2pdf [directory]"

# Iterate over all the subdirectories of the current directory:
for subdir in filter(os.path.isdir, os.listdir('.')):

pdfname = subdir+".pdf"
ignorelist = []
print "Processing", pdfname, "[",
c = canvas.Canvas(pdfname)

# Iterate over all the possible pictures in the subdirectory:
PicList = glob.glob(os.path.join(subdir,"*"))

for pic in PicList:

try: # Assume the file is a valid image
c.drawImage(pic, 0, 0, width, height)
print os.path.basename(pic),
except: # Didn't seem to be an image

print "]"

if len(ignorelist) > 0:
print "Ignored non-image file(s)",
for file in ignorelist:
print file,
print ""

No comments: