| Level: Intermediate Nathan Harrington, Programmer, IBM
26 Aug 2008 Time-availability maps provide a listing of who is most likely to be available
for a certain hour in a certain location. Find out how to use Google Earth and a log of
your communications to map and identify the time and place when availabilities match.
Nationwide and international teams, flexible work hours, and four-day workweeks all
contribute to change in when and where teams work together. This article presents
tools and code to help you find the best times to reach various members of your team
across geographical areas using Google Earth. Using a common method of message
time-tracking (e-mail headers) and a program to generate Keyhole Markup Language (KML) with appropriate settings,
this article demonstrates useful visualization techniques using Google Earth's TimeSpan feature and the "time slider."
Requirements
Hardware
Any hardware with 3-D acceleration capable of running Google Earth has sufficient
processing power for the code in this article. The KML described herein uses tens of
thousands of polygon vertices for the United States alone, so faster processors may be
required for global rendering or precision beyond the state level.
Software
Google Earth V4 or later is required to support the TimeSpan feature critical to the
visualizations described. Perl is required for KML building, as well
as extracting e-mail header information. You'll need the Mail::IMAPClient and
IO::Socket::SSL modules from CPAN (see Resources).
Note that the code presented here is cross-platform and should run on any platform Google Earth and Perl will run on.
Description of general approach taken to
building time-availability maps
Time-availability maps provide a listing of who is most likely to be available for a
certain hour in a certain location. For example, Figures 1 and 2 show users in
different portions of the country that will be likely to receive messages during a
particular time window. Instant messaging logs, phone-usage records, group calendars,
badge-reader access, and any number of other time-related data records are suitable for
creating these time-availability maps.
This article focuses on the extraction of perhaps the most common form of availability
data: e-mail headers. A person is most likely to be available around the times they are
most active sending messages. Each user is assigned the appropriate geographical area,
and KML is created with designator fade depths based on message count per hour. Using
Google Earth's time-slider feature, including animation and total time window selection,
helps visualize the resulting availability map for users throughout a geographic area.
Consider Figure 1, which demonstrates an example visualization early in the day.
Figure 1. Example visualization -- start time
Figure 2 displays a broader time window, later in the day, which helps identify regions
of availability for users through the United States.
Figure 2. Example visualization -- later time
Extracting state outline information
One of Google's many excellent KML documentation pages lists an example with a rough
outline of U.S. states in KML. Comprising about 13,000 points, these rough outlines
provide an excellent basis for highlighting states. Grab the Google U.S. states
example file and read on for details on how the extractStates.pl program separates the state information.
Listing 1. extractStates.pl full program
#!/usr/bin/perl -w
# extractStates.pl write each state coordinates into a separate file
use strict;
my $str = ""; # built output string
my $fname = ""; # current filename
my $cmd = `mkdir states/`;
while( my $line = <STDIN> )
{
if( $line =~ /<name>/ )
{
# extract the state for file name
$fname = substr( $line, index($line,"CDATA[")+6 );
$fname = substr( $fname, 0, index($fname," (") );
}elsif( $line =~ /<TimeSpan>/ )
{
# change the TimeSpan designator for later processing
$line = <STDIN>; # begin tags
$line = <STDIN>; # close TimeSpan
$line = qq{ <TimeSpan></TimeSpan>\n};
}elsif( $line =~ /<\/Placemark/ )
{
# write out the file, reset variables after closing tag
open( OUTFILE, "> states/$fname ") or die "Can't write state file";
print OUTFILE $str;
print OUTFILE $line;
close(OUTFILE);
$str = "";
}#if closing tag
# add a line if it's at the start or the built string is not blank
if( $line =~ /<Placemark/ || $str ne "" ){ $str .= $line }
}#line in
|
After declaring variables and creating the states directory, the extractStates.pl program reads every line from STDIN . The us_states.kml file contains the state's rough border
geometries, and the extractStates.pl program will write each
of these geometries into its own file. The TimeSpan entries specific to the us_states.kml file are replaced with a more-easily modifiable
placeholder, and the entire information for each state is written out to the states directory.
Save the code in Listing 1 above into a file named extractStates.pl and run the program
with the command cat us_states.kml | perl extractStates.pl .
Check the states directory to see a list of files like that shown below.
Listing 2. List of states directory
ls -la states/* | head
-rw-r--r-- 1 nathan nathan 6708 2008-07-08 17:11 states/Alabama
-rw-r--r-- 1 nathan nathan 85426 2008-07-08 17:11 states/Alaska
...
-rw-r--r-- 1 nathan nathan 15804 2008-07-08 17:11 states/West Virginia
-rw-r--r-- 1 nathan nathan 11536 2008-07-08 17:11 states/Wisconsin
-rw-r--r-- 1 nathan nathan 3298 2008-07-08 17:11 states/Wyoming
|
Extracting e-mail time information
Extracting e-mail headers and processing the entries for sent time is relatively easy
with the Mail::IMAPClient and IO::Socket:SSL modules from CPAN. The example below uses
the Google Internet Message Access Protocol (IMAP) interface, but the code presented here should work on any number of mail
servers. You may have to eliminate the SSL connection, depending on your server setup.
Listing 3. extractEmails.pl modules, connection setup
#!/usr/bin/perl -w
# extractEmails.pl get all e-mails, print listing of from at what hour
use strict;
use Mail::IMAPClient;
use IO::Socket::SSL;
my %timeHash = (); # data structure for whom at what time
# create a SSL socket to the imap server
my $socket = IO::Socket::SSL->new( PeerAddr => 'imap.gmail.com',
PeerPort => 993
) or die "can't create socket";
# create an imap connection through the ssl socket
my $imap = Mail::IMAPClient->new( Socket => $socket,
User => 'yourEmailID@gmail.com',
Password => 'yourPassword'
) or die "can't connect imap";
$imap->select("INBOX");
my @messages = $imap->search('ALL');
|
Creating the socket and IMAP connection is straightforward. Listing 4 creates and
prints the e-mail and time-data structure.
Listing 4. Hour extraction, data-structure printing
my $msgCount = 0;
for my $msg ( @messages )
{
my $from = $imap->get_header($msg,"From");
my $date = $imap->get_header($msg,"Date");
# set date to main hour
$date = substr($date, index($date,":")-2,2);
# increment the hour's count for that id
$timeHash{$from}{$date}++;
$msgCount++;
if( $msgCount % 10 == 0 ){ print STDERR "$msgCount\n" }
}#for each message
$imap->logout();
# print all of the hour/from combinations for later processing
for my $from( keys %timeHash )
{
for my $time( keys %{ $timeHash{$from} } )
{
print "$from TIME $time $timeHash{$from}{$time}\n";
}#for time
}#for id
|
Note that in this example, the time selected is the hour of sending for each e-mail.
You may find it useful to select hours, minutes, days, or weeks for your particular
availability scenario. Run the extractEmails.pl program with the command perl
extractEmails.pl > emailHours . After printing a progress indicator every 10
headers to STDERR , the command above will produce an emailHours file like that shown below.
Listing 5. Example emailHours file
Dave <dave@ibmdevworks.com> TIME 11 6
Dave <dave@ibmdevworks.com> TIME 21 8
...
Bob <bob@ibmdevworks.com> TIME 07 6
Bob <bob@ibmdevworks.com> TIME 11 36
|
The format of the emailHours file is name (if available),
e-mail address, TIME delimiter, hour.
Assigning e-mail addresses a geographical location
The emailHours file now contains a list of all of the e-mail
addresses and the number of times they sent a message for a given hour. You may want
to process a select few of your contacts, or create a list of the top senders of
e-mail. Consider the following one-liner to create a list of the top 50 e-mail senders
in the emailHours file.
Listing 6. Command to produce top 50 e-mail senders
cat emailHours | \
perl -lane '@a=split "TIME";$h{$a[0]}+=$F[@F-1]; \
END{for(keys %h){print "$h{$_} $_"}}' | sort -nr | head -n50 > top50emails
|
Note that the \ characters in Listing 6 are for formatting
only and should not be included when the command is run. Running the above command
produces a list like that shown below.
Listing 7. Example top50emails file
44 Bob <bob@ibmdevworks.com>
38 Dave <dave@ibmdevworks.com>
34 Tom <tom@ibmdevworks.com>
30 Mike <mike@ibmdevworks.com>
...
|
Modify the top50emails file by inserting the state name,
then a STATE delimiter at the beginning of each file. You
can do this manually or link the state designator with a geo-ip locator, employee
address database, or other source of geo-locating data. Save the modified file as
stateMapping , as shown below.
Listing 8. Example stateMapping file
New York STATE 44 Bob <bob@ibmdevworks.com>
North Carolina STATE 38 Dave <dave@ibmdevworks.com>
Virginia STATE 34 Tom <tom@ibmdevworks.com>
Georgia STATE 30 Mike <mike@ibmdevworks.com>
...
|
Generating KML markup with createKml.pl
With the state coordinates extracted, the full e-mail headers and times counted, and
each relevant e-mail ID associated with a state name, a KML file can be generated to
produce the desired visualization. Listing 9 shows the beginning of the
createKml.pl program.
Listing 9. createKml.pl program header, main loop
#!/usr/bin/perl -w
#createKml.pl build google earth kml, fade states based on entries per hour
use strict;
die "specify state mapping file, maximum, intervals " unless ( @ARGV == 3 );
my( $inFile, $max, $interval ) = @ARGV;
my %state = ();
loadStateMapping();
kmlHeader();
kmlStyles();
while( my $line = <STDIN> )
{
# for bogus entry elimination
next unless length( $line) > 20 ;
chomp($line);
#change person@ibm.com TIME 11 2 into components
my( $mail, $time ) = split "TIME ", $line;
my( $stHour, $countVal ) = split " ", $time;
# continue if a state defined for that mail
next unless exists($state{$mail});
open( INFILE,"states/$state{$mail}") or die "no state input file";
while( my $line = <INFILE> )
{
if ( $line =~ /<name>/ ){ print "<name><![CDATA[$mail]]></name>\n" }
elsif( $line =~ /<TimeSpan>/ ){ getTimes( $stHour ) }
elsif( $line =~ /Style_/ ) { getStyle( $countVal ) }
else { print $line }
}#while line in
close(INFILE);
}#line in
print qq{</Document>\n</kml>\n};
|
After ensuring the proper usage and declaring variables, the loadStateMapping subroutine is called. The assigned state is read
for each e-mail address, and the kmlHeader and kmlStyles subroutines are called to print the appropriate KML markup for the specified threshold and intervals.
The main loop is entered to read every line on STDIN and
extract relevant information. The e-mail address is specified as the place-mark
name, the TimeSpan start and end points are computed, and the appropriate style are all
written based on the thresholds and intervals specified.
Listing 10 shows the first of these subroutines, loadStateMapping , in detail.
Listing 10. loadStateMapping , kmlHeader subroutines
sub loadStateMapping
{
# create a hash storing which mail corresponds to which state
open( INFILE,"$inFile" ) or die "no in state file";
while( my $line = <INFILE> )
{
chomp($line);
my( $sname, $mail ) = split "STATE ", $line;
# skip the total count
$mail = substr($mail, index($mail," ")+1);
$state{$mail} = $sname;
}#stateMapping lines
close(INFILE);
}#loadStateMapping
sub kmlHeader
{
print qq{<?xml version="1.0" encoding="UTF-8"?>\n};
print qq{<kml xmlns="http://earth.google.com/kml/2.2">\n};
print qq{<Document>\n};
print qq{ <name><![CDATA[Time Availability]]></name>\n};
print qq{ <open>1</open>\n};
}#kmlHeader
|
For faster processing, the loadStateMapping file simply
reads the state assignment file. Create a hash keyed on e-mail address for each state
name, to be checked in the main program loop. This allows certain entries to be
skipped over if a state has not been assigned. The kmlHeader subroutine prints out the main header markup for the KML
document. Listing 11 shows the getTimes subroutine.
Listing 11. getTimes subroutine
sub getTimes
{
my $endHour = $inHour + 1;
if( length($endHour) == 1 ){ $endHour = "0$endHour" }
print qq{ <TimeSpan>\n};
print qq{ <begin>2008-07-01T$inHour:00Z</begin>\n};
print qq{ <end>2008-07-01T$endHour:00Z</end>\n};
print qq{ </TimeSpan>\n};
}#getTimes
|
Specifying the correct TimeSpan markup is performed by the code listed above in the
getTimes subroutine. Listing 12 below shows the more-complex
getStyle subroutine.
Listing 12. getStyle subroutine
sub getStyle
{
# find the appropriate style based on the input value
my $inputVal = $_[0];
my $decInc = $max / $interval;
my $count = $decInc;
my $styleCount = 0;
# move through each interval, exit when the input value no longer fits
while( $count <= $max )
{
if( $count > $inputVal ){ last }
$styleCount++;
$count += $decInc;
}#While count less than max
# default to the last style if interval is outside the boundary
if( $styleCount >= $interval ){ $styleCount-- }
print qq{ <styleUrl>#style} . $styleCount . qq{</styleUrl>\n};
}#getStyle
|
Like the kmlStyles subroutine shown below, the getStyle subroutine first creates a incrementer that is the maximum
specified value divided by the interval. For example, a maximum value of 100 and an
interval of 5 while produce a series of "buckets" 20 units apart that each input value
will fall into. An input value of 40, for example, will correspond to a low-medium
fade style, whereas a value of 80 or greater will correspond to the most opaque fade
setting. Listing 13 shows the kmlStyles subroutine with a similar function.
Listing 13. kmlStyles
sub kmlStyles
{
# create a incremented "fade range" according to the number of intervals
my $hexInc = 255/$interval;
my $count = $hexInc;
my $styleNum = 0;
while( $count <= 255 )
{
my $fade = sprintf("%X", $count );
print qq{ <Style id="style} . $styleNum . qq{">\n};
print qq{ <IconStyle>\n};
print qq{ <scale>0.4</scale>\n};
print qq{ <Icon>\n};
print qq{ <href>http://maps.google.com/mapfiles/kml/};
print qq{shapes/star.png</href>\n};
print qq{ </Icon>\n};
print qq{ </IconStyle>\n};
print qq{ <LabelStyle>\n};
print qq{ <color>9900ffff</color>\n};
print qq{ <scale>1</scale>\n};
print qq{ </LabelStyle>\n};
print qq{ <LineStyle>\n};
print qq{ <color>99FFFF99</color>\n};
print qq{ <width>2</width>\n};
print qq{ </LineStyle>\n};
print qq{ <PolyStyle>\n};
print qq{ <color>} . $fade . qq{FF9933</color>\n};
print qq{ <fill>1</fill>\n};
print qq{ <outline>1</outline>\n};
print qq{ </PolyStyle>\n};
print qq{ </Style>\n};
$styleNum++;
$count += $hexInc;
}#while count
}#kmlStyles
|
Regardless of the maximum or number of intervals specified, each defined style needs to
have a fade percentage of 00-ff . The appropriate conversion between decimal intervals
and hexadecimal fade percentage is performed using the %X
modifier in the sprintf command. Save the above code in a
file called createKml.pl and read on for usage information.
Usage
Selection of appropriate maximum and interval variables is largely dependent on your
specific data. Try a maximum value of 20- to 40-percent less than the maximum value
recorded in top50emails . Choosing an interval is largely a
trade-off between presenting too much information with a high number of intervals, or
showing too little change between the intervals when a low number is selected. Try a
straightforward example with the command cat emailHours | perl
createKml.pl stateMapping 20 2 > timeMap.kml .
After opening Google Earth and loading the timeMap.kml file, look for the "time slider"
in the upper center of the screen. Move through the visualization in time by dragging
the slider, or press the play button to show an animation. Also try expanding the
visible time window to expand the time range certain states are visible.
Conclusion, further modifications
With the incredible Google Earth interface, and the custom code above for creating KML,
you can build your own time-availability maps for a wide variety of applications.
Consider extracting login and activity times from your instant messaging logs, phone
records, or other sources to build additional and overlapping data sets. Expand the
time windows, or focus more on a minute-to-minute spread of information through your
company or customer set. Extract your Web-server visitor information and build a
municipality-specific set of place marks and designators to explore where and when
your Web-site visitors see your content.
Download Description | Name | Size | Download method |
---|
Sample code | os-google-earth-perl.timeAvailabilityMaps.zip | 4KB | HTTP |
---|
Resources Learn
-
Google offers excellent KML documentation.
-
This article makes use of the us_states.kml
example KML file from Brian Flood.
-
To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
- Stay current with developerWorks' Technical events and webcasts.
- Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
- Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.
Get products and technologies
Discuss
About the author | | | Nathan Harrington is a programmer at IBM currently working with Linux and resource-locating technologies. |
Rate this page
| |