This is an archived cached-text copy of the developerWorks article. Please consider viewing the original article at: IBM developerWorks



Skip to main content

skip to main content

developerWorks  >  Open source  >

Create time-availability maps with Perl and Google Earth

Visualize when your team members, customers, or systems are available by extracting message data and displaying it in Google Earth

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


New site feature

Check out our new article design and features. Tell us what you think.


Rate this page

Help us improve this content


Level: Intermediate

Nathan Harrington, Programmer, IBM 

26 Aug 2008

Time-availability maps provide a listing of who is most likely to be available for a certain hour in a certain location. Find out how to use Google Earth and a log of your communications to map and identify the time and place when availabilities match.

Nationwide and international teams, flexible work hours, and four-day workweeks all contribute to change in when and where teams work together. This article presents tools and code to help you find the best times to reach various members of your team across geographical areas using Google Earth. Using a common method of message time-tracking (e-mail headers) and a program to generate Keyhole Markup Language (KML) with appropriate settings, this article demonstrates useful visualization techniques using Google Earth's TimeSpan feature and the "time slider."

Requirements

Hardware

Any hardware with 3-D acceleration capable of running Google Earth has sufficient processing power for the code in this article. The KML described herein uses tens of thousands of polygon vertices for the United States alone, so faster processors may be required for global rendering or precision beyond the state level.

Software

Google Earth V4 or later is required to support the TimeSpan feature critical to the visualizations described. Perl is required for KML building, as well as extracting e-mail header information. You'll need the Mail::IMAPClient and IO::Socket::SSL modules from CPAN (see Resources).

Note that the code presented here is cross-platform and should run on any platform Google Earth and Perl will run on.

Description of general approach taken to building time-availability maps

Time-availability maps provide a listing of who is most likely to be available for a certain hour in a certain location. For example, Figures 1 and 2 show users in different portions of the country that will be likely to receive messages during a particular time window. Instant messaging logs, phone-usage records, group calendars, badge-reader access, and any number of other time-related data records are suitable for creating these time-availability maps.

This article focuses on the extraction of perhaps the most common form of availability data: e-mail headers. A person is most likely to be available around the times they are most active sending messages. Each user is assigned the appropriate geographical area, and KML is created with designator fade depths based on message count per hour. Using Google Earth's time-slider feature, including animation and total time window selection, helps visualize the resulting availability map for users throughout a geographic area. Consider Figure 1, which demonstrates an example visualization early in the day.


Figure 1. Example visualization -- start time
 Example visualization -- start time

Figure 2 displays a broader time window, later in the day, which helps identify regions of availability for users through the United States.


Figure 2. Example visualization -- later time
Example visualization -- later time



Back to top


Extracting state outline information

One of Google's many excellent KML documentation pages lists an example with a rough outline of U.S. states in KML. Comprising about 13,000 points, these rough outlines provide an excellent basis for highlighting states. Grab the Google U.S. states example file and read on for details on how the extractStates.pl program separates the state information.


Listing 1. extractStates.pl full program

#!/usr/bin/perl -w
# extractStates.pl write each state coordinates into a separate file
use strict;
my $str   = "";  # built output string
my $fname = "";  # current filename

my $cmd = `mkdir states/`;

while( my $line = <STDIN> )
{
  if( $line =~ /<name>/ )
  {
    # extract the state for file name
    $fname = substr( $line, index($line,"CDATA[")+6 );
    $fname = substr( $fname, 0, index($fname," (")  );

  }elsif( $line =~ /<TimeSpan>/ )
  {
    # change the TimeSpan designator for later processing
    $line = <STDIN>;  # begin tags
    $line = <STDIN>;  # close TimeSpan
    $line = qq{      <TimeSpan></TimeSpan>\n};

  }elsif( $line =~ /<\/Placemark/ )
  {
    # write out the file, reset variables after closing tag
    open( OUTFILE, "> states/$fname ") or die "Can't write state file";
      print OUTFILE $str;
      print OUTFILE $line;
    close(OUTFILE);

    $str = "";

  }#if closing tag

  # add a line if it's at the start or the built string is not blank
  if( $line =~ /<Placemark/ || $str ne "" ){ $str .= $line }

}#line in

After declaring variables and creating the states directory, the extractStates.pl program reads every line from STDIN. The us_states.kml file contains the state's rough border geometries, and the extractStates.pl program will write each of these geometries into its own file. The TimeSpan entries specific to the us_states.kml file are replaced with a more-easily modifiable placeholder, and the entire information for each state is written out to the states directory.

Save the code in Listing 1 above into a file named extractStates.pl and run the program with the command cat us_states.kml | perl extractStates.pl. Check the states directory to see a list of files like that shown below.


Listing 2. List of states directory

ls -la states/* | head
-rw-r--r-- 1 nathan nathan  6708 2008-07-08 17:11 states/Alabama
-rw-r--r-- 1 nathan nathan 85426 2008-07-08 17:11 states/Alaska
...
-rw-r--r-- 1 nathan nathan 15804 2008-07-08 17:11 states/West Virginia
-rw-r--r-- 1 nathan nathan 11536 2008-07-08 17:11 states/Wisconsin
-rw-r--r-- 1 nathan nathan  3298 2008-07-08 17:11 states/Wyoming



Back to top


Extracting e-mail time information

Extracting e-mail headers and processing the entries for sent time is relatively easy with the Mail::IMAPClient and IO::Socket:SSL modules from CPAN. The example below uses the Google Internet Message Access Protocol (IMAP) interface, but the code presented here should work on any number of mail servers. You may have to eliminate the SSL connection, depending on your server setup.


Listing 3. extractEmails.pl modules, connection setup

#!/usr/bin/perl -w 
# extractEmails.pl get all e-mails, print listing of from at what hour
use strict;
use Mail::IMAPClient;
use IO::Socket::SSL;
my %timeHash = ();    # data structure for whom at what time

# create a SSL socket to the imap server
my $socket = IO::Socket::SSL->new( PeerAddr => 'imap.gmail.com',
                                   PeerPort => 993
                                 ) or die "can't create socket";

# create an imap connection through the ssl socket
my $imap = Mail::IMAPClient->new( Socket   => $socket,
                                  User     => 'yourEmailID@gmail.com',
                                  Password => 'yourPassword'
                                ) or die "can't connect imap";
$imap->select("INBOX");
my @messages = $imap->search('ALL');

Creating the socket and IMAP connection is straightforward. Listing 4 creates and prints the e-mail and time-data structure.


Listing 4. Hour extraction, data-structure printing

my $msgCount = 0;
for my $msg ( @messages )
{
  my $from = $imap->get_header($msg,"From");
  my $date = $imap->get_header($msg,"Date");

  # set date to main hour
  $date = substr($date, index($date,":")-2,2);

  # increment the hour's count for that id
  $timeHash{$from}{$date}++;

  $msgCount++;
  if( $msgCount % 10 == 0 ){ print STDERR "$msgCount\n" }

}#for each message

$imap->logout();

# print all of the hour/from combinations for later processing
for my $from( keys %timeHash )
{
  for my $time( keys %{ $timeHash{$from} } )
  {
    print "$from TIME $time $timeHash{$from}{$time}\n";
  }#for time
}#for id


Note that in this example, the time selected is the hour of sending for each e-mail. You may find it useful to select hours, minutes, days, or weeks for your particular availability scenario. Run the extractEmails.pl program with the command perl extractEmails.pl > emailHours. After printing a progress indicator every 10 headers to STDERR, the command above will produce an emailHours file like that shown below.


Listing 5. Example emailHours file

Dave <dave@ibmdevworks.com> TIME 11 6
Dave <dave@ibmdevworks.com> TIME 21 8
...
Bob <bob@ibmdevworks.com> TIME 07 6
Bob <bob@ibmdevworks.com> TIME 11 36

The format of the emailHours file is name (if available), e-mail address, TIME delimiter, hour.



Back to top


Assigning e-mail addresses a geographical location

The emailHours file now contains a list of all of the e-mail addresses and the number of times they sent a message for a given hour. You may want to process a select few of your contacts, or create a list of the top senders of e-mail. Consider the following one-liner to create a list of the top 50 e-mail senders in the emailHours file.


Listing 6. Command to produce top 50 e-mail senders

cat emailHours | \
  perl -lane '@a=split "TIME";$h{$a[0]}+=$F[@F-1]; \
    END{for(keys %h){print "$h{$_} $_"}}' | sort -nr | head -n50  > top50emails

Note that the \ characters in Listing 6 are for formatting only and should not be included when the command is run. Running the above command produces a list like that shown below.


Listing 7. Example top50emails file

44 Bob <bob@ibmdevworks.com>
38 Dave <dave@ibmdevworks.com>
34 Tom <tom@ibmdevworks.com>
30 Mike <mike@ibmdevworks.com>
...

Modify the top50emails file by inserting the state name, then a STATE delimiter at the beginning of each file. You can do this manually or link the state designator with a geo-ip locator, employee address database, or other source of geo-locating data. Save the modified file as stateMapping, as shown below.


Listing 8. Example stateMapping file

New York STATE 44 Bob <bob@ibmdevworks.com>
North Carolina STATE 38 Dave <dave@ibmdevworks.com>
Virginia STATE 34 Tom <tom@ibmdevworks.com>
Georgia STATE 30 Mike <mike@ibmdevworks.com>
...



Back to top


Generating KML markup with createKml.pl

With the state coordinates extracted, the full e-mail headers and times counted, and each relevant e-mail ID associated with a state name, a KML file can be generated to produce the desired visualization. Listing 9 shows the beginning of the createKml.pl program.


Listing 9. createKml.pl program header, main loop

#!/usr/bin/perl -w
#createKml.pl build google earth kml, fade states based on entries per hour
use strict;
die "specify state mapping file, maximum, intervals " unless ( @ARGV == 3 );
my( $inFile, $max, $interval ) = @ARGV;
my %state = ();

loadStateMapping();
kmlHeader();
kmlStyles();

while( my $line = <STDIN> )
{
  # for bogus entry elimination
  next unless length( $line) > 20 ;
  chomp($line);

  #change person@ibm.com TIME 11 2 into components
  my( $mail, $time ) = split "TIME ", $line;
  my( $stHour, $countVal ) = split " ", $time;

  # continue if a state defined for that mail
  next unless exists($state{$mail});

  open( INFILE,"states/$state{$mail}") or die "no state input file";
    while( my $line = <INFILE> )
    {
      if   ( $line =~ /<name>/     ){ print "<name><![CDATA[$mail]]></name>\n" }
      elsif( $line =~ /<TimeSpan>/ ){ getTimes( $stHour )   }
      elsif( $line =~ /Style_/ )    { getStyle( $countVal ) }
      else                          { print $line }
    }#while line in

  close(INFILE);

}#line in

print qq{</Document>\n</kml>\n};

After ensuring the proper usage and declaring variables, the loadStateMapping subroutine is called. The assigned state is read for each e-mail address, and the kmlHeader and kmlStyles subroutines are called to print the appropriate KML markup for the specified threshold and intervals.

The main loop is entered to read every line on STDIN and extract relevant information. The e-mail address is specified as the place-mark name, the TimeSpan start and end points are computed, and the appropriate style are all written based on the thresholds and intervals specified.

Listing 10 shows the first of these subroutines, loadStateMapping, in detail.


Listing 10. loadStateMapping, kmlHeader subroutines

sub loadStateMapping
{
  # create a hash storing which mail corresponds to which state

  open( INFILE,"$inFile" ) or die "no in state file";
    while( my $line = <INFILE> )
    {
      chomp($line);
      my( $sname, $mail )  = split "STATE ", $line;

      # skip the total count
      $mail = substr($mail, index($mail," ")+1);

      $state{$mail} = $sname;

    }#stateMapping lines

  close(INFILE);

}#loadStateMapping

sub kmlHeader
{
  print qq{<?xml version="1.0" encoding="UTF-8"?>\n};
  print qq{<kml xmlns="http://earth.google.com/kml/2.2">\n};
  print qq{<Document>\n};
  print qq{  <name><![CDATA[Time Availability]]></name>\n};
  print qq{  <open>1</open>\n};

}#kmlHeader

For faster processing, the loadStateMapping file simply reads the state assignment file. Create a hash keyed on e-mail address for each state name, to be checked in the main program loop. This allows certain entries to be skipped over if a state has not been assigned. The kmlHeader subroutine prints out the main header markup for the KML document. Listing 11 shows the getTimes subroutine.


Listing 11. getTimes subroutine

sub getTimes
{
  my $endHour = $inHour + 1;
  if( length($endHour) == 1 ){ $endHour = "0$endHour" }

  print  qq{    <TimeSpan>\n};
  print  qq{        <begin>2008-07-01T$inHour:00Z</begin>\n};
  print  qq{        <end>2008-07-01T$endHour:00Z</end>\n};
  print  qq{    </TimeSpan>\n};

}#getTimes

Specifying the correct TimeSpan markup is performed by the code listed above in the getTimes subroutine. Listing 12 below shows the more-complex getStyle subroutine.


Listing 12. getStyle subroutine

sub getStyle
{
  # find the appropriate style based on the input value
  my $inputVal = $_[0];
  my $decInc = $max / $interval;

  my $count = $decInc;
  my $styleCount = 0;

  # move through each interval, exit when the input value no longer fits
  while( $count <= $max )
  {
    if( $count > $inputVal ){ last }
    $styleCount++;
    $count += $decInc;

  }#While count less than max

  # default to the last style if interval is outside the boundary 
  if( $styleCount >= $interval ){ $styleCount-- }

  print qq{        <styleUrl>#style} . $styleCount . qq{</styleUrl>\n};

}#getStyle

Like the kmlStyles subroutine shown below, the getStyle subroutine first creates a incrementer that is the maximum specified value divided by the interval. For example, a maximum value of 100 and an interval of 5 while produce a series of "buckets" 20 units apart that each input value will fall into. An input value of 40, for example, will correspond to a low-medium fade style, whereas a value of 80 or greater will correspond to the most opaque fade setting. Listing 13 shows the kmlStyles subroutine with a similar function.


Listing 13. kmlStyles

sub kmlStyles
{
  # create a incremented "fade range" according to the number of intervals
  my $hexInc = 255/$interval;
  my $count = $hexInc;
  my $styleNum = 0;

  while( $count <= 255 )
  {
    my $fade = sprintf("%X", $count );

    print qq{  <Style id="style} . $styleNum . qq{">\n};
    print qq{    <IconStyle>\n};
    print qq{      <scale>0.4</scale>\n};
    print qq{      <Icon>\n};
    print qq{        <href>http://maps.google.com/mapfiles/kml/};
    print qq{shapes/star.png</href>\n};
    print qq{      </Icon>\n};
    print qq{    </IconStyle>\n};
    print qq{    <LabelStyle>\n};
    print qq{      <color>9900ffff</color>\n};
    print qq{      <scale>1</scale>\n};
    print qq{    </LabelStyle>\n};
    print qq{    <LineStyle>\n};
    print qq{      <color>99FFFF99</color>\n};
    print qq{      <width>2</width>\n};
    print qq{    </LineStyle>\n};
    print qq{    <PolyStyle>\n};
    print qq{      <color>} . $fade . qq{FF9933</color>\n};
    print qq{      <fill>1</fill>\n};
    print qq{      <outline>1</outline>\n};
    print qq{    </PolyStyle>\n};
    print qq{  </Style>\n};

    $styleNum++;
    $count += $hexInc;

  }#while count

}#kmlStyles

Regardless of the maximum or number of intervals specified, each defined style needs to have a fade percentage of 00-ff. The appropriate conversion between decimal intervals and hexadecimal fade percentage is performed using the %X modifier in the sprintf command. Save the above code in a file called createKml.pl and read on for usage information.



Back to top


Usage

Selection of appropriate maximum and interval variables is largely dependent on your specific data. Try a maximum value of 20- to 40-percent less than the maximum value recorded in top50emails. Choosing an interval is largely a trade-off between presenting too much information with a high number of intervals, or showing too little change between the intervals when a low number is selected. Try a straightforward example with the command cat emailHours | perl createKml.pl stateMapping 20 2 > timeMap.kml.

After opening Google Earth and loading the timeMap.kml file, look for the "time slider" in the upper center of the screen. Move through the visualization in time by dragging the slider, or press the play button to show an animation. Also try expanding the visible time window to expand the time range certain states are visible.



Back to top


Conclusion, further modifications

With the incredible Google Earth interface, and the custom code above for creating KML, you can build your own time-availability maps for a wide variety of applications. Consider extracting login and activity times from your instant messaging logs, phone records, or other sources to build additional and overlapping data sets. Expand the time windows, or focus more on a minute-to-minute spread of information through your company or customer set. Extract your Web-server visitor information and build a municipality-specific set of place marks and designators to explore where and when your Web-site visitors see your content.




Back to top


Download

DescriptionNameSizeDownload method
Sample codeos-google-earth-perl.timeAvailabilityMaps.zip4KBHTTP
Information about download methods


Resources

Learn
  • Google offers excellent KML documentation.

  • This article makes use of the us_states.kml example KML file from Brian Flood.

  • To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.

  • Stay current with developerWorks' Technical events and webcasts.

  • Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.

  • Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.

  • Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.


Get products and technologies

Discuss


About the author

Nathan Harrington

Nathan Harrington is a programmer at IBM currently working with Linux and resource-locating technologies.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top