| Level: Intermediate Nathan Harrington, Programmer, IBM
06 Jan 2009 Social-networking data analysis can help you understand content, connections, and opportunities for your personal and business associations. This article presents tools and code to extract key components of your social network using the Twitter API to chart, geolocate, and visualize your social-networking data.
This article is a proof-of-concept that shows how to build applications to visualize your interconnections and influence. Graph common subject-matter keywords in your discussions and create geographical maps of your friends' locations. The code presented here relies on Perl, Graphviz, the Cooperative Association for Internet Data Analysis (CAIDA) plot-latlong, and the Google Chart API to create helpful visualizations to analyze your social networks.
Hardware and software requirements
Any PC manufactured after 2000 should provide plenty of horsepower for compiling and running the code here. As of this writing, CAIDA's plot-latlong tool requires a UNIX®-like operating system for geographical map creation. The other visualizations are made using curl and Graphviz, which are available for a wider variety of platforms.
You need Perl and the XML::Simple, Geo::Coder::Yahoo, and GD Perl modules, which process the social-networking data. A good image viewer, such as feh, is also recommended. To manipulate user images into a standard PNG format, the "convert" component of ImageMagick is required. See Resources for information on where to find these programs.
To install these applications on a Debian-based distribution of Linux®, such as Ubuntu, enter the following command in a terminal window: sudo apt-get install perl feh imagemagick curl graphviz . You need to download plot-latlong manually. After unpacking the plot-latlong archive, copy the .mapimages directory and the .mapinfo file to your ${HOME} directory.
Although this article demonstrates the code on Linux, the data gathering and processing code can be adapted easily to work on any platform that supports Perl, such as Microsoft® Windows®.
Extracting social-network data using the Twitter API
Twitter's RESTful interface and clear API documentation provide excellent methods for you to access social-networking attributes. See Resources for more information about the Twitter API. Listing 1 shows the initial buildViz.pl program setup.
Listing 1. buildViz.pl, Part 1
#!/usr/bin/perl -w
# buildViz.pl create social networking visualizations
use strict;
use XML::Simple;
die "specify searchUser, username, password, mode " unless @ARGV == 4;
my( $search, $user, $pass, $mode ) = @ARGV;
my $cmd = "mkdir xml/; mkdir img/";
system( $cmd ) unless( -d "xml" && -d "img" );
# get user's profile data
$cmd = qq{ curl -u $user:$pass "http://twitter.com/users/show/$user.xml" };
$cmd .= qq{ > xml/$user.xml };
system( $cmd ) unless( -e "xml/$user.xml" );
# get profile image
my $xmlImg = XMLin( "xml/$user.xml" );
my $imgUrl = $xmlImg->{profile_image_url};
$cmd = qq{ curl "$imgUrl" > img/$user.png ; };
$cmd .= qq{ convert -format png img/$user.png img/$user.png };
system( $cmd ) unless( -e "img/$user.png" );
# get users' friends (people that user is following)
$cmd = qq{ curl -u $user:$pass "http://twitter.com/statuses/friends.xml" };
$cmd .= qq{ > xml/$user.friends.xml };
system( $cmd ) unless( -e "xml/$user.friends.xml" );
|
After specifying the required modules and Twitter API credentials, directories are created and the XML for the specified user is retrieved. Note that you can create visualizations for any Twitter user who does not protect his updates. Good form requires that the XML files only be retrieved once, so each XML file will be retrieved if it does not exist on the local filesystem. You'll need to delete these files manually if the most recent data is required.
Next, the image for the specified user is downloaded, along with a list of that users' friends. In concert with the Twitter API documentation, this article uses the terms "friends" and "people you are following" interchangeably. Listing 2 continues the retrieval of friends for the specified users friends.
Listing 2. buildViz.pl, Part 2
my $xmlFriend = XMLin( "xml/$user.friends.xml" );
for my $name ( keys %{ $xmlFriend->{user} } )
{
my $userFr = $xmlFriend->{user}->{$name}->{screen_name};
# get friends' friends
$cmd = qq{ curl -u $user:$pass "http://twitter.com/statuses/friends/};
$cmd .= qq{$userFr.xml?page=1" > xml/$userFr.friends.xml};
system( $cmd ) unless( -e "xml/$userFr.friends.xml" );
# get friends most recent 200 tweets
$cmd = qq{ curl -u $user:$pass "http://twitter.com/statuses/user_timeline/};
$cmd .= qq{$userFr.xml?count=200" > xml/$userFr.user_timeline.xml};
system( $cmd ) unless( -e "xml/$userFr.user_timeline.xml" );
# get friends image (requires imagemagick convert)
my $imgUrl = $xmlFriend->{user}->{$name}->{profile_image_url};
$cmd = qq{ curl "$imgUrl" > img/$userFr.png ; };
$cmd .= qq{ convert -format png img/$userFr.png img/$userFr.png };
system( $cmd ) unless( -e "img/$userFr.png" );
}#for each friend
|
As you review your social-networking connection, you may find that your friends share many friends. The unless ( -e sections help reduce the burden on Twitter's servers by only retrieving unique XML files.
In addition to the "friends of friends" list, each friend's timeline is retrieved, along with that friend's profile image. Save the contents of Listing 1 and 2 as the file buildViz.pl and type the command perl buildViz.pl searchUser yourUserName yourPassword retrieve . In this case, searchUser is the username of the Twitter user whose social-networking data you want to retrieve. yourUserName and yourPassword are your authentication credentials, and retrieve is a placeholder to specify XML downloads only.
The buildViz.pl program will create the img and xml subdirectories, and fill them with files like that shown below.
Listing 3. Example img/ xml/ directories
87953 2008-11-26 08:21 xml/agberg.friends.xml
187263 2008-11-26 08:21 xml/agberg.user_timeline.xml
85451 2008-11-26 08:23 xml/alphaworks.friends.xml
50967 2008-11-26 08:23 xml/alphaworks.user_timeline.xml
85854 2008-11-26 08:21 xml/andysc.friends.xml
163570 2008-11-26 08:21 xml/andysc.user_timeline.xml
83236 2008-11-26 08:23 xml/BillHiggins.friends.xml
177740 2008-11-26 08:23 xml/BillHiggins.user_timeline.xml
...
5626 2008-11-26 08:21 img/agberg.png
5753 2008-11-26 08:23 img/alphaworks.png
2080 2008-11-26 08:21 img/andysc.png
4527 2008-11-26 08:23 img/BillHiggins.png
|
Developing interconnections data and visualization using Graphviz
One method to measure a particular user's influence on their friends is to measure the number of friends that user has. In theory, users with fewer friends have more time to follow social-networking updates and respond to questions. Add the contents of Listing 4 at line 53 in buildViz.pl.
Listing 4. visualizeInfluence subroutine
visualizeInfluence() if( $mode eq "influence" );
### begin subroutines
sub visualizeInfluence
{
my %frHash = ();
my $xmlFriend = XMLin( "xml/$user.friends.xml" );
for my $name ( keys %{ $xmlFriend->{user} } )
{
my $userFr = $xmlFriend->{user}->{$name}->{screen_name};
my $xmlSec = XMLin( "xml/$userFr.friends.xml" );
$frHash{ $userFr } = 0;
for my $linkUser( keys %{ $xmlSec->{user} } ){ $frHash{$userFr}++ }
}#for each friend
my $infList = "1 $user\n";
for my $name ( sort {$frHash{$a} <=> $frHash{$b}} keys %frHash )
{
$infList .= "$frHash{$name} $name\n";
last if( ($infList =~ s/\n/\n/g) == 15 ); # exit after fifteen lines
}# for each key sorted
chop($infList); # remove last newline
$cmd = qq{ echo "$infList" | perl twitdot.pl $user img > influence.fdp ; };
$cmd .= qq{ fdp influence.fdp -Tpng -o graphviz_influence.png };
system($cmd);
}#visualizeInfluence
|
Each friends list of friends is counted, and the top 15 "influence-able" friends are added to the $infList variable. These count, and friend name combinations are passed as input to the twitdot.pl program. Based on code from the "Explore relationships among Web pages visually" article, the twitdot.pl program generates fdp graph-generation syntax for Graphviz. Consult the article and the code Download section for more information about the modifications necessary for this particular visualization.
Next, fdp is called with the fdp graph syntax file to generate the visualization. Run the program with the command perl buildViz.pl searchUser yourUserName yourPassword influence and view the output file (graphviz_influence.png) in your favorite image viewer. Figure 1 shows an example of what this can look like.
Figure 1. Example graphviz_influence.png
The width and color of the arrows indicate the "influence-ability" of each of the friends, based on the number of friends they have.
Developing keyword data and visualization using the Google chart API
Influence has been measured, but what about content? Add the code shown in Listing 5 at line 87 in buildViz.pl to create a chart showing the most commonly used words in your message history.
Listing 5. visualizeKeywords subroutine
sub visualizeKeywords
{
my %wordHash = ();
my $xmlFriend = XMLin( "xml/$user.friends.xml" );
for my $name ( keys %{ $xmlFriend->{user} } )
{
my $userFr = $xmlFriend->{user}->{$name}->{screen_name};
my $xmlSec = XMLin( "xml/$userFr.user_timeline.xml" );
for my $linkUser( keys %{ $xmlSec->{status} } )
{
my $msgText = $xmlSec->{status}->{$linkUser}->{text};
for my $key( split " ", lc($msgText) ){ $wordHash{$key}++ }
}#for each text update
}#for each friend
my $tStr = "";
my $chlStr = "";
for my $word ( sort {$wordHash{$b} <=> $wordHash{$a}} keys %wordHash )
{
next unless( length($word) > 10 ); # only print 'long' entries
$tStr .= "$wordHash{$word},"; # append url data
$chlStr .= "$word|"; # append url labels
last if( ($tStr =~ s/,/,/g) == 10 ); # exit loop after first ten words
}#for the top words
chop($tStr); chop($chlStr); # remove trailing delimiters
$cmd = qq{ curl "http://chart.apis.google.com/chart?cht=p&chd=t:$tStr};
$cmd .= qq{&chs=1000x300&chl=$chlStr" > chart_keywords.png };
system($cmd);
}#visualizeKeywords |
Each word from each of your friends' timelines is recorded in the %wordHash variable. To measure some of the more significant verbiage, a minimum length of 10 is required for the word to be graphed. The top 10 words meeting these requirements and their frequency counts are then packed into a URL for generation using the Google Chart API. Check the Resources section for more information about the URL formats and the options available with Google Charts.
Add the subroutine call shown below to buildViz.pl at line 54.
Listing 6. visualizeKeywords logic call
visualizeKeywords() if( $mode eq "keywords" );
|
Run the keyword visualization with the command perl buildViz.pl searchUser yourUserName yourPassword keywords . View the output chart_keywords.png file with your image viewer. Figure 2 demonstrates what this can look like.
Figure 2. Example chart_keywords.png
Developing geolocated data and visualization using plot-latlong
After charting who can be influenced and what is being said, we can move on to visualizing where in the world these people are. Add the code shown in Listing 7 at line 125 in buildViz.pl.
Listing 7. visualizeLocations subroutine
sub visualizeLocations
{
use Geo::Coder::Yahoo;
my $geocoder = Geo::Coder::Yahoo->new(appid => 'my_app' );
open( LOCOUT, ">locationNames" ) or die "no locationNames out\n";
open( COORDS, ">cityCoords" ) or die "no cityCoords out \n";
# record all friends geographical locations
my $xmlFriend = XMLin( "xml/$user.friends.xml" );
for my $name ( keys %{ $xmlFriend->{user} } )
{
my $userLoc = $xmlFriend->{user}->{$name}->{location};
my $imgName = $xmlFriend->{user}->{$name}->{screen_name};
my $location = $geocoder->geocode( location => "$userLoc" );
for my $coords( @{$location} )
{
my %hashRef = %{ $coords };
print "$hashRef{latitude} $hashRef{longitude} # $userLoc\n";
print COORDS "$hashRef{latitude} $hashRef{longitude} # $userLoc\n";
print LOCOUT "$userLoc ##$imgName.png\n";
}#for coordinates returned
}#for each friend
close( COORDS ); close( LOCOUT );
# draw the map
$cmd = qq{ cat cityCoords | perl plot-latlong -s 5 -c };
$cmd .= qq{ > cityMap.png 2>cityPixels };
system( $cmd );
# Annotate the map with the first 7 friends information
$cmd = qq{ head -n7 locationNames > 7.locationNames ; };
$cmd .= qq{ head -n7 cityPixels > 7.cityPixels ; };
$cmd .= qq{ perl worldCompositeMap.pl 7.cityPixels 7.locationNames };
$cmd .= qq{ cityMap.png worldCityMap_annotated.png };
system($cmd);
}#visualizeLocations
|
Again making use of prior developerWorks-published code, the worldCompositeMap.pl program is detailed in "Create geographical plots of your data using Perl, GD, and plot-latlong." Using the excellent Geo::Coder::Yahoo module, it's relatively easy to record the city coordinates for your friends' locations in the cityCoords file, and the associated name and image data in the locationNames file.
The first seven friends' locations and identifiers are then passed to the worldCompositeMap.pl for rendering. Consult the article link above or the Download section for more information about the worldCompositeMap.pl program.
Add the subroutine call shown in Listing 8 at line 55 in buildViz.pl.
Listing 8. visualizeLocations logic call
visualizeLocations() if( $mode eq "locations" );
|
Run the command perl buildViz.pl searchUser yourUserName yourPassword locations to build the worldCityMap_annotated.png file, and open that file in your image viewer. Figure 3 is an example of what this can look like.
Figure 3. Example worldCityMap_annotated.png
Conclusion, further examples
With the code and tools presented here, you can create a variety of visualizations to help analyze attributes of your social network. Use these tools to track keywords as they spread through your network of friends. Visualize the paths of particular links as they travel to different areas of activity around the world. Help create charts and analysis for your employers to help them see the deep value of social networking.
Download Description | Name | Size | Download method |
---|
Sample code | os-socialtoolstwitterVisualizations.0.1.zip | | HTTP |
---|
Resources Learn
Get products and technologies
-
UNIX and Linux users: If you're new to installing Perl modules, Andreas J. Konig's CPAN module automates the installation of other modules.
-
You need to add these Perl modules, and any dependencies from CPAN:
-
AT&T Research created the Graphviz graph visualization software.
-
Download Tom Gilbert's image viewer, feh.
-
If you need the best in image manipulation software, get The Gimp.
-
Cooperative Association for Internet Data Analysis (CAIDA) built and hosts the plot-latlong program among other great tools.
-
ImageMagick is a software suite to create, edit, and compose bitmap images.
-
curl is a command-line tool for transferring files using a URL syntax.
-
Innovate your next open source development project with IBM trial software, available for download or on DVD.
-
Download IBM product evaluation versions, and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Download Perl, or read more about Perl at Perl.org.
Discuss
About the author | | | Nathan Harrington is a programmer working with Linux at IBM. You can find more information about him at nathanharrington.info. |
Rate this page
| |