Login

clasqm · 09-23-2016, 12:40 PM

Just a little CLI app to grab the book title and author from a Project Gutenberg UTF-8 text file and write them to attributes of the file. By itself this won't be of much interest, but it is part of an ebook reader project of mine.

Usage:
pgextract_en path/to/file.txt
asks for confirmation in the Terminal before writing the attributes

pgextract_en --noconfirm path/to/file.txt
Skips confirmation. For use in batch conversions. This app will only accept a single filename, but you can use it inside a for loop in a shell script.

pgextract_en --confirmGUI path/to/file.txt
Puts the confirmation process in a graphical Alert. Haven't quite figured what that would be good for yet, but who knows?

pgextract_en --help OR pgextract_en -h
shows help

path/to/file.txt cannot contain spaces. Maybe in the next version, but Project Gutenberg files have names like pg12345.txt anyway.

I discovered that PG files have some nasty embedded codes in the beginning of the file, otherwise more straightforward approaches would have been possible. This code does require some clean-up - too many exit points, for one thing.

Code:
#!/bin/env yab

doc pgextract_en v0.1

doc Extract author and title data from a Project Gutenberg text file,

doc and write these to attributes.

doc (c) Michel Clasquin-Johnson, 2016, Public Domain

doc

doc Usage:

doc   pgextract_en <--noconfirm> <--confirmGUI> <path/to/file>

doc

doc The default behaviour is to ask for confirmation in text mode before

doc writing the attributes. The --noconfirm switch skips this step. The

doc --confirmGUI switch puts the confirmation in a Haiku two-button alert.

doc These switches are INCOMPATIBLE! All switches are case-insensitive.

doc

doc Pathnames should NOT contain spaces. One file at a time, please!

doc

doc This will only work with English-language files, since it searches for

doc the strings  "The Project Gutenberg EBook of " and ", by". I may write

doc versions for other languages if necessary.

doc 

fulltitle$=""

title$=""

author$=""

noconfirm =0

thefile$ = peek$("argument")

if lower$(thefile$) = "--help" or lower$(thefile$) = "-h" showhelp()

if lower$(thefile$) = "--noconfirm" then 

    noconfirm =1

    thefile$ = peek$("argument")

elseif lower$(thefile$) = "--confirmgui" then 

    noconfirm =-1

    thefile$ = peek$("argument")

endif

if thefile$ = "" exit

firstline$ = system$("head -n 1 " + thefile$)

print "Processing " + thefile$

print "First line: " 

print firstline$

print "Parsing ..."

parse()

switch noconfirm

    case -1    //GUI confirmation

        a$ = "Full Title: " + fulltitle$ + ".\n"

        a$ = a$ + "Title: " + title$ + ".\n"

        a$ = a$ + "Author: " + author$ + ".\n\n"

        a = ALERT a$ + "Write these attributes to " + thefile$ + "?", "Yes", "", "No", "warning" 

        if a = 1 writeattribs()

    break

    case 0    //CLI confirmation

        print "Full entry: " + fulltitle$

        print "Title: " + title$

        print "Author: " + author$

        input "Write these attributes to the file? (y/n) " a$

        if lower$(left$(a$,1)) = "y" writeattribs()

    break

    case 1    // no confirmation - for automated bulk operations

            //requires the --noconfirm switch

        writeattribs()

    break

    default

    break

end switch

exit

sub writeattribs()

    print

    print "Setting attribute ebook:full_title to " + fulltitle$ + "."

    attribute set "String", "ebook:full_title", fulltitle$, thefile$

    print "Setting attribute ebook:title to " + title$ + "."

    attribute set "String", "ebook:title", title$, thefile$

    print "Setting attribute ebook:author to " + author$ + "."

    attribute set "String", "ebook:author", author$, thefile$

end sub

sub showhelp()

    for a=1 to arraysize(docu$(),1) 

        print docu$(a) 

    next a 

    exit

end sub

sub parse()

    local without_asterixes$, character$, postitle, posauthor, search1$, search2$

    //change the following 2 lines for books in other languages

    search1$ = "The Project Gutenberg EBook of "

    search2$ = ", by "

    //some PG files have asterisks in them. Replace these with spaces

    //then remove them later with trim$

    for f = 1 to len(firstline$)

        character$ = mid$(firstline$, f,1)

        if character$ = "*" or character$ = chr$(20) character$ = " "

        without_asterixes$ = without_asterixes$ + character$

    next f

    firstline$ = without_asterixes$

    firstline$ = trim$(firstline$)

        print "Cleaned up the first line:"

        Print firstline$

    postitle = instr(lower$(firstline$), lower$(search1$)) + len(search1$)

    fulltitle$ = trim$(mid$(firstline$, postitle))

    posauthor = instr(lower$(fulltitle$), lower$(search2$)) + len(search2$)

    title$ = trim$(left$(fulltitle$, posauthor - (len(search2$)+1)))

    author$ = trim$(mid$(fulltitle$, posauthor))

end sub

clasqm · 09-26-2016, 01:37 PM

Yechh, just when you think you have a solution ...

it turns out that Project Gutenberg files are not quite standardized. The first line can read "The Project Gutenberg Ebook of ..." or "Project Gutenberg's Etext of ..." or any combination of the above. Back to the drawing board. But I'll leave this here, someone might be able to use the code for something else.

Also, I've put a few ebooks on the repo to test the waters, but that's not the way to go. You'd end up with literally thousands of packages called ebookXXXXXXX.hpkg and nobody would be able to find my actual apps. Either set up a separate repo for ebooks or think of something different.

Login
Username:
Password:	Lost Password?
	Remember me