Friday, October 3, 2008

Unabridged

Work on a version of a book including comments, is coming along swimmingly. The first version of the book I mostly copy & pasted, and formatted by hand. For this version, I've mostly automated the process. I wrote the perl script below to download the blog and strip a lot of the html formatting from the text.


#!/usr/bin/perl
use LWP::Simple;

open INFILE, "bloglinks.txt";
@lines = <INFILE>;
close INFILE;


foreach $line (@lines)
{
if ($line =~ /(http.+html)"/){unshift @urls, $1;}
}

open OF, ">blogcontent.txt";

foreach $url (@urls)
{
$on = 0;
my $content = get $url;
@content = split(/\n/, $content);
foreach $content1 (@content)
{
if ($content1 =~ /Post a Comment/){$on = 0}
if ($on == 1)
{
$content1 =~ s/<br \/>/\n/g;
$content1 =~ s/<br\/>/\n/g;
$content1 =~ s/<BR \/>/\n/g;
$content1 =~ s/<BR\/>/\n/g;
$content1 =~ s/<\/h3>/(-\/B-)/;
$content1 =~ s/<p.*?>/\n\n/g;
$content1 =~ s/<span style="font-style:italic;">/(-I-)/;
$content1 =~ s/<.*?>//g;
$content1 =~ s/.+, 200.+M/\n\n/;
$content1 =~ s/(comments:)/$1\n\n/;
$content1 =~ s/^ +$//;
$content1 =~ s/<dd class='comment-footer'>/\n\n/;
print OF $content1;
}
if ($content1 =~ /<h3 class='post-title entry-title'>/)
{
$on = 1;
print OF "(-B-)";
}
}
}
close OF;


Also, I've refamiliarized myself with regular expressions. Regular expressions are a really powerful tool for parsing texts. I use them often when programming, but they would be really useful to anyone who had to do large scale editing and formatting of texts.

Not sure if the perl script would be of use to anyone else. If there were a demand from people who wanted to download their blog to their computer, I could probably place the script on my server and hook it up to the web to automate the process for others.

I should have a pdf of the unabridged book completed sometime next week. So far with text only, no pictures and little formatting, the book is about 300 pages. When done, I'm guessing it will be 500 some pages. If that's the case, it should cost about $20 including shipping.

So far eight copies of the book have been sold. At first I assumed that maybe grampa had bought a bunch of copies to give to friends, but additional copies of the book have been ordered. So maybe Grampa and I don't account for all the orders. If you ordered a book, thanks! Please let me know what you think of it (including any corrections/typos).

And wish the Admiral and me luck. We're running in a 5k tomorrow. I haven't run in a road race in years, but I've been working out daily at the college gym. Hopefully I'll have a good time.

8 comments:

Rose said...

Hi Scott,
Fred & I bought a copy even though I printed out the blog with comments each day for Fred (since he rarely uses the computer) and we kept each day in a binder. Looking forward to receiving it and getting the unabridged version with the comments.


Good job!

Rose

Ross said...

I love the cover shot on the book.

I can't believe all that Perl script stuff - it looks quite confusing to the uninitiated.

I too am amazed that a person could 'publish' their own book at such a low cost . . how is it possible?!

NautiG said...

Thanks Rose!

The cover shot is an old picture of me and my old pearson sailboat.

The perl script may be made more unreadable because I tend to "hack" my programs, rather than properly code them. Although, a lot of people take pride in making very succinct and indecipherable perl scripts.

I'm amazed how cheap the printing costs of the book are too. I wonder if it's somehow a loss leader, and Lulu makes its money off some additional services they sell?

SV-Footprint said...

Congratulations on your book - what a great way to immortalize the memories. The Cover shot is perfect!

I would very much appreciate more info as to how to extract the text and pictures in a similar way for my blog (as a backup, and easy point of access). Can you give more details - if I wanted to use your pearl script - exactly what would I do? Easier still (for me) - when are you going to publish a working copy so I can just type in my blog name and get the result file? You did suggest you might!!!

Have fun

NautiG said...

Maryanne, I'll see what I can do about getting the script online. I'm up in Maryland this week, so I don't know how much time I'll have to work on it.

I want to give you and Peter Y copies of the book. Hopefully I can get you a copy before you set off on your adventure. Wish I was going with you.

SV-Footprint said...

Cool! Thanks Scott. I'll try and send you an email off line with our addresses (but I remember before you never received my direct emails!)

Kyle and I are so excited to be headed south. For years we were working and tied to the dock, watching all the cruisers go by (including you!). Those years were painful, but worth the wait now that we can finally go :-)

SV-Footprint said...

Oh - and no rush for you to post your on-line version. But when you do, I'll be your first customer :-)

NautiG said...

Maryanne,

Send me a message to "blog", then the at sign, then "nautig", a dot and finally "com". I don't check that email unless I'm expecting a message about the blog. I'll check a few times the next couple days for a message from you.

Scott