|
|
Amazon Web Services How-Toby Jackie Hamilton (kira@cgi101.com) The CGI101 books pages are autogenerated using several Perl scripts that fetch XML data from Amazon.com. Amazon provides access to their data via the Amazon E-Commerce Services (Formerly Amazon Web Services). For more information about ECS: I had originally been using Net::Amazon to retrieve the data (using the older SOAP feed), but for this version I opted to work directly with the XML data. I use three scripts to do the work: fetchbooks.pl actually fetches the XML data, and stores it in XML files here on the cgi101 server. Typically I run this script once per night via a cron job. By saving the data once per day, I'm not adding any significant load to Amazon's own servers, and I can re-run the parse.pl script (which formats the data) as many times as I want without hitting Amazon every time. (Amazon's made it a bit harder to find the browsenode numbers; use Gwaanin's browsenode lookup to find them.) parse.pl actually formats the XML data into the individual book pages. It uses the XML::Simple module to parse the XML, which then stuffs the data into nested hash references. So, for example, to read the actual XML, I do:
my $xml = XMLin($xmlfile);
For the Amazon ItemSearch feeds, I'm only interested in the "Item" nodes, and ItemSearch returns 10 of them by default, so the data for the individual items is stored in an array reference:
my @items = @{$xml->{Items}->{Item}};
It took me a bit of playing around with to figure out just where the data was in the $xml reference. Also some nodes could be scalars or arrays, which was a bit of a nuisance; parse.pl has this bit of code for handling reviews:
my $reviews = $item->{EditorialReviews}{EditorialReview};
if (UNIVERSAL::isa($reviews, "ARRAY")) {
foreach my $review (@{$item->{EditorialReviews}{EditorialReview}}) {
if ($review->{Source} eq "Book Description") {
$desc = $review->{Content};
}
}
} else {
$desc = $reviews->{Content};
}
Thus if $reviews is an array (e.g., there's more tha one review node for that item), I have to loop through and look for the "Book Description" source. If $reviews is a scalar, then I can just go ahead and use its Content node. truncate.pm is a module I wrote a while back to handle truncation of HTML strings. It uses HTML::Parser to scan through the string, and invokes different functions depending on whether it's looking at an open-tag, close-tag, or plain text. It also keeps a running count of the number of text characters (it doesn't count characters within a tag), and when it reaches the specified max, it closes all of the open tags and returns the truncated string. I needed to truncate many of the book descriptions as they were insanely long. I wanted the text on each books page to be concise, so I've truncated them at 500 characters, and added a "more" link to Amazon's own site, where viewers can read more info and reviews (and hopefully order the book). More Info
If you're fairly new to Perl, and syntax like |
|
|
|
|