Login

clasqm · 09-23-2016, 12:40 PM

Just a little CLI app to grab the book title and author from a Project Gutenberg UTF-8 text file and write them to attributes of the file. By itself this won't be of much interest, but it is part of an ebook reader project of mine.

Usage:
pgextract_en path/to/file.txt
asks for confirmation in the Terminal before writing the attributes

pgextract_en --noconfirm path/to/file.txt
Skips confirmation. For use in batch conversions. This app will only accept a single filename, but you can use it inside a for loop in a shell script.

pgextract_en --confirmGUI path/to/file.txt
Puts the confirmation process in a graphical Alert. Haven't quite figured what that would be good for yet, but who knows?

pgextract_en --help OR pgextract_en -h
shows help

path/to/file.txt cannot contain spaces. Maybe in the next version, but Project Gutenberg files have names like pg12345.txt anyway.

I discovered that PG files have some nasty embedded codes in the beginning of the file, otherwise more straightforward approaches would have been possible. This code does require some clean-up - too many exit points, for one thing.

Code:
#!/bin/env yab

doc pgextract_en v0.1

doc Extract author and title data from a Project Gutenberg text file,

doc and write these to attributes.

doc (c) Michel Clasquin-Johnson, 2016, Public Domain

doc

doc Usage:

doc   pgextract_en <--noconfirm> <--confirmGUI> <path/to/file>

doc

doc The default behaviour is to ask for confirmation in text mode before

doc writing the attributes. The --noconfirm switch skips this step. The

doc --confirmGUI switch puts the confirmation in a Haiku two-button alert.

doc These switches are INCOMPATIBLE! All switches are case-insensitive.

doc

doc Pathnames should NOT contain spaces. One file at a time, please!

doc

doc This will only work with English-language files, since it searches for

doc the strings  "The Project Gutenberg EBook of " and ", by". I may write

doc versions for other languages if necessary.

doc 

fulltitle$=""

title$=""

author$=""

noconfirm =0

thefile$ = peek$("argument")

if lower$(thefile$) = "--help" or lower$(thefile$) = "-h" showhelp()

if lower$(thefile$) = "--noconfirm" then 

    noconfirm =1

    thefile$ = peek$("argument")

elseif lower$(thefile$) = "--confirmgui" then 

    noconfirm =-1

    thefile$ = peek$("argument")

endif

if thefile$ = "" exit

firstline$ = system$("head -n 1 " + thefile$)

print "Processing " + thefile$

print "First line: " 

print firstline$

print "Parsing ..."

parse()

switch noconfirm

    case -1    //GUI confirmation

        a$ = "Full Title: " + fulltitle$ + ".\n"

        a$ = a$ + "Title: " + title$ + ".\n"

        a$ = a$ + "Author: " + author$ + ".\n\n"

        a = ALERT a$ + "Write these attributes to " + thefile$ + "?", "Yes", "", "No", "warning" 

        if a = 1 writeattribs()

    break

    case 0    //CLI confirmation

        print "Full entry: " + fulltitle$

        print "Title: " + title$

        print "Author: " + author$

        input "Write these attributes to the file? (y/n) " a$

        if lower$(left$(a$,1)) = "y" writeattribs()

    break

    case 1    // no confirmation - for automated bulk operations

            //requires the --noconfirm switch

        writeattribs()

    break

    default

    break

end switch

exit

sub writeattribs()

    print

    print "Setting attribute ebook:full_title to " + fulltitle$ + "."

    attribute set "String", "ebook:full_title", fulltitle$, thefile$

    print "Setting attribute ebook:title to " + title$ + "."

    attribute set "String", "ebook:title", title$, thefile$

    print "Setting attribute ebook:author to " + author$ + "."

    attribute set "String", "ebook:author", author$, thefile$

end sub

sub showhelp()

    for a=1 to arraysize(docu$(),1) 

        print docu$(a) 

    next a 

    exit

end sub

sub parse()

    local without_asterixes$, character$, postitle, posauthor, search1$, search2$

    //change the following 2 lines for books in other languages

    search1$ = "The Project Gutenberg EBook of "

    search2$ = ", by "

    //some PG files have asterisks in them. Replace these with spaces

    //then remove them later with trim$

    for f = 1 to len(firstline$)

        character$ = mid$(firstline$, f,1)

        if character$ = "*" or character$ = chr$(20) character$ = " "

        without_asterixes$ = without_asterixes$ + character$

    next f

    firstline$ = without_asterixes$

    firstline$ = trim$(firstline$)

        print "Cleaned up the first line:"

        Print firstline$

    postitle = instr(lower$(firstline$), lower$(search1$)) + len(search1$)

    fulltitle$ = trim$(mid$(firstline$, postitle))

    posauthor = instr(lower$(fulltitle$), lower$(search2$)) + len(search2$)

    title$ = trim$(left$(fulltitle$, posauthor - (len(search2$)+1)))

    author$ = trim$(mid$(fulltitle$, posauthor))

end sub

Login
Username:
Password:	Lost Password?
	Remember me