pgextract - new app
#1
Just a little CLI app to grab the book title and author from a Project Gutenberg UTF-8 text file and write them to attributes of the file. By itself this won't be of much interest, but it is part of an ebook reader project of mine.

Usage:
pgextract_en path/to/file.txt
asks for confirmation in the Terminal before writing the attributes

pgextract_en --noconfirm path/to/file.txt
Skips confirmation. For use in batch conversions. This app will only accept a single filename, but you can use it inside a for loop in a shell script.

pgextract_en --confirmGUI path/to/file.txt
Puts the confirmation process in a graphical Alert. Haven't quite figured what that would be good for yet, but who knows?

pgextract_en --help OR pgextract_en -h
shows help

path/to/file.txt cannot contain spaces. Maybe in the next version, but Project Gutenberg files have names like pg12345.txt anyway.

I discovered that PG files have some nasty embedded codes in the beginning of the file, otherwise more straightforward approaches would have been possible. This code does require some clean-up - too many exit points, for one thing.

Code:
#!/bin/env yab

doc pgextract_en v0.1
doc Extract author and title data from a Project Gutenberg text file,
doc and write these to attributes.
doc (c) Michel Clasquin-Johnson, 2016, Public Domain
doc
doc Usage:
doc   pgextract_en <--noconfirm> <--confirmGUI> <path/to/file>
doc
doc The default behaviour is to ask for confirmation in text mode before
doc writing the attributes. The --noconfirm switch skips this step. The
doc --confirmGUI switch puts the confirmation in a Haiku two-button alert.
doc These switches are INCOMPATIBLE! All switches are case-insensitive.
doc
doc Pathnames should NOT contain spaces. One file at a time, please!
doc
doc This will only work with English-language files, since it searches for
doc the strings  "The Project Gutenberg EBook of " and ", by". I may write
doc versions for other languages if necessary.
doc

fulltitle$=""
title$=""
author$=""
noconfirm =0
thefile$ = peek$("argument")
if lower$(thefile$) = "--help" or lower$(thefile$) = "-h" showhelp()
if lower$(thefile$) = "--noconfirm" then
    noconfirm =1
    thefile$ = peek$("argument")
elseif lower$(thefile$) = "--confirmgui" then
    noconfirm =-1
    thefile$ = peek$("argument")
endif
if thefile$ = "" exit
firstline$ = system$("head -n 1 " + thefile$)
print "Processing " + thefile$
print "First line: "
print firstline$
print "Parsing ..."
parse()
switch noconfirm
    case -1    //GUI confirmation
        a$ = "Full Title: " + fulltitle$ + ".\n"
        a$ = a$ + "Title: " + title$ + ".\n"
        a$ = a$ + "Author: " + author$ + ".\n\n"
        a = ALERT a$ + "Write these attributes to " + thefile$ + "?", "Yes", "", "No", "warning"
        if a = 1 writeattribs()
    break
    case 0    //CLI confirmation
        print "Full entry: " + fulltitle$
        print "Title: " + title$
        print "Author: " + author$
        input "Write these attributes to the file? (y/n) " a$
        if lower$(left$(a$,1)) = "y" writeattribs()
    break
    case 1    // no confirmation - for automated bulk operations
            //requires the --noconfirm switch
        writeattribs()
    break
    default
    break
end switch

exit

sub writeattribs()
    print
    print "Setting attribute ebook:full_title to " + fulltitle$ + "."
    attribute set "String", "ebook:full_title", fulltitle$, thefile$
    print "Setting attribute ebook:title to " + title$ + "."
    attribute set "String", "ebook:title", title$, thefile$
    print "Setting attribute ebook:author to " + author$ + "."
    attribute set "String", "ebook:author", author$, thefile$
end sub

sub showhelp()
    for a=1 to arraysize(docu$(),1)
        print docu$(a)
    next a
    exit
end sub

sub parse()
    local without_asterixes$, character$, postitle, posauthor, search1$, search2$
    //change the following 2 lines for books in other languages
    search1$ = "The Project Gutenberg EBook of "
    search2$ = ", by "
    //some PG files have asterisks in them. Replace these with spaces
    //then remove them later with trim$
    for f = 1 to len(firstline$)
        character$ = mid$(firstline$, f,1)
        if character$ = "*" or character$ = chr$(20) character$ = " "
        without_asterixes$ = without_asterixes$ + character$
    next f
    firstline$ = without_asterixes$
    firstline$ = trim$(firstline$)
        print "Cleaned up the first line:"
        Print firstline$
    postitle = instr(lower$(firstline$), lower$(search1$)) + len(search1$)
    fulltitle$ = trim$(mid$(firstline$, postitle))
    posauthor = instr(lower$(fulltitle$), lower$(search2$)) + len(search2$)
    title$ = trim$(left$(fulltitle$, posauthor - (len(search2$)+1)))
    author$ = trim$(mid$(fulltitle$, posauthor))
end sub
Reply


Messages In This Thread
pgextract - new app - by clasqm - 09-23-2016, 12:40 PM
RE: pgextract - new app - by clasqm - 09-26-2016, 01:37 PM

Forum Jump:


Users browsing this thread: 1 Guest(s)
Free Web Hosting