Re: [LAU] How to handle large amount of pdf files (music sheets) for collaborative usage

From: Daniel Appelt <daniel.appelt@email-addr-hidden>
Date: Sun Apr 04 2010 - 01:37:39 EEST

On the topic of OCR, you could also check out specialized Optical
Music Recognition systems
(http://en.wikipedia.org/wiki/Optical_music_recognition). In the
university department where I wrote my dimploma thesis gamera
(http://gamera.informatik.hsnr.de/) was used for this task.

Cheers, Daniel

2010/4/3 Luke Peterson <luke.peterson@email-addr-hidden>:
> PDFsam -- PDF Split-And-Merge is a handy open-source tool.
> (http://www.pdfsam.org/)
>
> But its title is its featureset, for the most part. It allows you to reorder
> PDFs, pull pages out, add pages in, rotate pages 90, 180, 270 degrees, etc.
> Command-line driven but there's also a gui console.
>
> It's got a windows installer, but should run anywhere Java is available.
>
> Sounds like on top of the scanning and organizing solution, you need to
> figure out some OCR application to extract metadata from each of the PDFs in
> a large-scale way.
>
> If you're planning to put these out for public consumption, you can use
> Google to assist you in your scanning and indexing:
>
> http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/
>
> Alternatively, the open-source OCR world is getting better fast. Check out
> OCRopus (http://en.wikipedia.org/wiki/OCRopus) -- it's a linux-based
> command-line OCR tool. You should be able to incorporate this into a
> workflow, it'll spit out what it thinks your PDF says in htmlish (specified
> here: http://docs.google.com/View?docid=dfxcv4vc_67g844kf).
>
> I could see a workflow on your end that creates four rotations of each page
> scanned, then attempts to OCR them in each degree of rotation with OCRopus,
> compares the results, and persists in your datastore the one with the
> highest combination of recognized characters and recognition score. I
> suppose this is only really helpful if a) your PDFs often get scanned
> upside-down or sideways, and b) all your PDFs have some amount of digital
> typography on them.
>
> Anyway, a couple ideas.
>
> -----
> Luke Peterson
>
>
_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@email-addr-hidden
http://lists.linuxaudio.org/listinfo/linux-audio-user
Received on Sun Apr 4 04:15:02 2010

This archive was generated by hypermail 2.1.8 : Sun Apr 04 2010 - 04:15:03 EEST