09-23-2016, 12:40 PM
Just a little CLI app to grab the book title and author from a Project Gutenberg UTF-8 text file and write them to attributes of the file. By itself this won't be of much interest, but it is part of an ebook reader project of mine.
pgextract_en path/to/file.txt
asks for confirmation in the Terminal before writing the attributes
pgextract_en --noconfirm path/to/file.txt
Skips confirmation. For use in batch conversions. This app will only accept a single filename, but you can use it inside a for loop in a shell script.
pgextract_en --confirmGUI path/to/file.txt
Puts the confirmation process in a graphical Alert. Haven't quite figured what that would be good for yet, but who knows?
pgextract_en --help OR pgextract_en -h
shows help
path/to/file.txt cannot contain spaces. Maybe in the next version, but Project Gutenberg files have names like pg12345.txt anyway.
I discovered that PG files have some nasty embedded codes in the beginning of the file, otherwise more straightforward approaches would have been possible. This code does require some clean-up - too many exit points, for one thing.
pgextract_en path/to/file.txt
asks for confirmation in the Terminal before writing the attributes
pgextract_en --noconfirm path/to/file.txt
Skips confirmation. For use in batch conversions. This app will only accept a single filename, but you can use it inside a for loop in a shell script.
pgextract_en --confirmGUI path/to/file.txt
Puts the confirmation process in a graphical Alert. Haven't quite figured what that would be good for yet, but who knows?
pgextract_en --help OR pgextract_en -h
shows help
path/to/file.txt cannot contain spaces. Maybe in the next version, but Project Gutenberg files have names like pg12345.txt anyway.
I discovered that PG files have some nasty embedded codes in the beginning of the file, otherwise more straightforward approaches would have been possible. This code does require some clean-up - too many exit points, for one thing.
#!/bin/env yab
doc pgextract_en v0.1
doc Extract author and title data from a Project Gutenberg text file,
doc and write these to attributes.
doc (c) Michel Clasquin-Johnson, 2016, Public Domain
doc Usage:
doc pgextract_en <--noconfirm> <--confirmGUI> <path/to/file>
doc The default behaviour is to ask for confirmation in text mode before
doc writing the attributes. The --noconfirm switch skips this step. The
doc --confirmGUI switch puts the confirmation in a Haiku two-button alert.
doc These switches are INCOMPATIBLE! All switches are case-insensitive.
doc Pathnames should NOT contain spaces. One file at a time, please!
doc This will only work with English-language files, since it searches for
doc the strings "The Project Gutenberg EBook of " and ", by". I may write
doc versions for other languages if necessary.
noconfirm =0
thefile$ = peek$("argument")
if lower$(thefile$) = "--help" or lower$(thefile$) = "-h" showhelp()
if lower$(thefile$) = "--noconfirm" then
noconfirm =1
thefile$ = peek$("argument")
elseif lower$(thefile$) = "--confirmgui" then
noconfirm =-1
thefile$ = peek$("argument")
if thefile$ = "" exit
firstline$ = system$("head -n 1 " + thefile$)
print "Processing " + thefile$
print "First line: "
print firstline$
print "Parsing ..."
switch noconfirm
case -1 //GUI confirmation
a$ = "Full Title: " + fulltitle$ + ".\n"
a$ = a$ + "Title: " + title$ + ".\n"
a$ = a$ + "Author: " + author$ + ".\n\n"
a = ALERT a$ + "Write these attributes to " + thefile$ + "?", "Yes", "", "No", "warning"
if a = 1 writeattribs()
case 0 //CLI confirmation
print "Full entry: " + fulltitle$
print "Title: " + title$
print "Author: " + author$
input "Write these attributes to the file? (y/n) " a$
if lower$(left$(a$,1)) = "y" writeattribs()
case 1 // no confirmation - for automated bulk operations
//requires the --noconfirm switch
end switch
sub writeattribs()
print "Setting attribute ebook:full_title to " + fulltitle$ + "."
attribute set "String", "ebook:full_title", fulltitle$, thefile$
print "Setting attribute ebook:title to " + title$ + "."
attribute set "String", "ebook:title", title$, thefile$
print "Setting attribute ebook:author to " + author$ + "."
attribute set "String", "ebook:author", author$, thefile$
end sub
sub showhelp()
for a=1 to arraysize(docu$(),1)
print docu$(a)
next a
end sub
sub parse()
local without_asterixes$, character$, postitle, posauthor, search1$, search2$
//change the following 2 lines for books in other languages
search1$ = "The Project Gutenberg EBook of "
search2$ = ", by "
//some PG files have asterisks in them. Replace these with spaces
//then remove them later with trim$
for f = 1 to len(firstline$)
character$ = mid$(firstline$, f,1)
if character$ = "*" or character$ = chr$(20) character$ = " "
without_asterixes$ = without_asterixes$ + character$
next f
firstline$ = without_asterixes$
firstline$ = trim$(firstline$)
print "Cleaned up the first line:"
Print firstline$
postitle = instr(lower$(firstline$), lower$(search1$)) + len(search1$)
fulltitle$ = trim$(mid$(firstline$, postitle))
posauthor = instr(lower$(fulltitle$), lower$(search2$)) + len(search2$)
title$ = trim$(left$(fulltitle$, posauthor - (len(search2$)+1)))
author$ = trim$(mid$(fulltitle$, posauthor))
end sub