save pdf as image tiff ccitg4

Sep 25, 2014 at 12:42 PM
Is it possible to convert the loaded PDF file to Tiff format using CCITG4 compression? Or any alternative to run the following gswin32c command:
gs9.14\\bin\\gswin32c -Igs9.14\\lib;fonts -dSAFER -dNumRenderingThreads=2 -dBATCH -dUseCropBox -dNOPAUSE -sPDFPassword=\"\" -sDEVICE=tiffg4  -r300 -sOutputFile=\"output\" 
Sep 25, 2014 at 1:03 PM
well i got my answer few minutes later from Josip Hapjan on Ghostscript IRC. here is a helpfull link:
https://github.com/jhabjan/Ghostscript.NET/blob/master/Ghostscript.NET.Samples/Samples/ProcessorSample.cs
Coordinator
Oct 3, 2014 at 7:48 AM
I'm glad I could help.

Cheers,
Josip
Jan 28, 2015 at 6:03 AM
Edited Jan 28, 2015 at 6:03 AM
               var desiredDpi = 150;

                var dtStartAnalysis = DateTime.Now;
                using (var rasterizer = new GhostscriptRasterizer())
                {
                    var ms = new MemoryStream(request.DocumentBytes);
                    rasterizer.Open(ms);

                    var codec = ImageCodecInfo.GetImageEncoders().Where(ice => ice.MimeType == "image/tiff").ElementAt(0);
                    var encoderParams = new EncoderParameters(1);
                    encoderParams.Param[0] = new EncoderParameter(Encoder.Compression, (long)EncoderValue.CompressionCCITT4); 

                    for (var pageNumber = 1; pageNumber <= rasterizer.PageCount; pageNumber++)
                    {
                        var img = rasterizer.GetPage(desiredDpi, desiredDpi, pageNumber);
                        var newImg = new Bitmap(img.Width, img.Height);

                        //this section converts the image to grayscale prior to slapping it down
                        var g = Graphics.FromImage(newImg);
                        var colorMatrix = new ColorMatrix(
                            new[]
                            {
                                new[] {.3f, .3f, .3f, 0, 0},
                                new[] {.59f, .59f, .59f, 0, 0},
                                new[] {.11f, .11f, .11f, 0, 0},
                                new[] {0f, 0, 0, 1, 0},
                                new[] {0f, 0, 0, 0, 1}
                            });

                        var attributes = new ImageAttributes();
                        attributes.SetColorMatrix(colorMatrix);
                        attributes.SetThreshold(0.8f); //threshold for switching gray to black or white
                        g.DrawImage(img, new Rectangle(0, 0, img.Width, img.Height), 0, 0, img.Width, img.Height, GraphicsUnit.Pixel, attributes);
                        g.Dispose();

                        using (var targetStream = new MemoryStream())
                        {
                            newImg.Save(targetStream, codec, encoderParams);
                        }
                    }
                }

                Trace.WriteLine(string.Format("Converted PDF to TIFF images in {0}", (DateTime.Now - dtStartAnalysis).TotalMilliseconds));
Here's how I'm converting pdf to tiff currently. Specifically to tiff, if you're using CCITT4 you want to make sure your pdf image is converted to bitonal, otherwise you will have loss of any image that's lightly drawn on the page.

The ColorMatrix and the SetThreshold() settings work in my specific case but anyone trying this on their own PDF documents should be aware that you may have to mess with those settings (at least with the threshold).

-DM
Jan 28, 2015 at 6:55 AM
The only concern I have is the speed at which this line is executed:
var img = rasterizer.GetPage(desiredDpi, desiredDpi, pageNumber);
On a pdf with ~1MB page size and 150 DPI it's ~1.5-2 seconds and I have some PDFs with 35 or more pages+. It takes me between 45 and 60 seconds to convert a 35-page PDF.

I'm actually using this to downconvert the pdfs into something more manageable. Unfortunately changing the size of the source PDF at this point is not going to happen.

Is there anything I can do to optimize the GetPage() process? DPI has to stay at 150, any lower and the pdf becomes unreadable.

Thanks.