Scripts-R-Us

Repo Speed Test

by Don Crissti

A few months ago, one of our forum members asked for a basic script that would test the download speed for various PCLinuxOS repository mirrors. This article is based on the script posted on the forum, and will assume you have very basic knowledge of programming concepts like variables, loops, conditional blocks. These concepts are mostly the same across all platforms and programming/scripting languages, so reading any tutorial for beginners should get you started.

So, how could we test the mirrors speed? One way to do it would be to download the same file from each mirror and then compare the results (the time needed to complete the download will be used to calculate the download speed of each mirror at that particular time of the day).

Our test file should be small enough, but not too small as that would make our results irrelevant. Also, it has to be in the repos regardless of the software that comes and goes. One of the best candidates would be srclist.main.bz2,located in the base section of the repo, as this file has an optimal size (around 400K) and is always there. Since some mirrors might be offline for maintenance or really slow to respond, we will first test if they respond in a timely manner, and only then test their speed. After running the speed test we will process the results and output a sorted list (starting with the fastest mirror). We should also inform the user if any of those mirrors timed out, or if the URL could not be found.

Any scripting language would be OK for this task, though Perl was my language of choice here. With Perl, there is a thing called CPAN, and it has literally thousands of modules of which many will make your life so much easier.

Let's move on...

All Perl scripts start with the famous she-bang line:

#!/usr/bin/perl

Linux takes the presence of the first two characters as an indication that the file is a script, and tries to execute that script using the interpreter specified by the rest of the line. Note that in Perl, every simple statement must end in a semicolon, unless it is the final statement in a block (put the semicolon in anyway, because you might eventually add another line).

The next two lines will force you to write better code. You should use them every time you write a Perl script as they will help you a lot:

use strict;
use warnings;

The 'use strict' line turns on the 'strict' pragma which forces you to declare your variables with the my keyword and also spits out errors if the code contains any barewords that can't be interpreted in their current context. The 'use warnings' line means all possible warnings are enabled and it's the cheapest way to find bugs.

Also, we're going to use two Perl module in our script: Time::HiRes and LWP::UserAgent. The first one helps us calculate the download time and the second one is our web user agent:

use Time::HiRes qw(gettimeofday);
use LWP::UserAgent;

Now, what is that qw thing? The 'quote word' function qw() is used to generate a list of words. In the above case, it has the effect of importing the gettimeofday function from the Time::HiRes module. It does this by providing the list 'gettimeofday' to the use function.

Next, we're going to define some of our variables. Perl has three types of variables and uses different symbols as prefixes for each type of variable:

  • a scalar stores a single value and it is prefixed with a dollar sign ($)
  • an array stores a list of values and it is prefixed with an at-sign (@)
  • a hash is an associative array (a paired group of elements) and it is prefixed with a percent sign (%)

We will only use scalars and arrays in our script. If you study the URLs for our test file you can see they all have a similar format:

*tp://***/...../pclinuxos/2007/base/srclist.main.bz2

We're going to split the links right before that common part and recreate them later when needed. There are mainly two reasons for that:

  • we want our code lines length under 80 chars (it's not a must though... but it's nicer)
  • second one might sound weird but believe me it's true: code guys are lazy... :-)

Also, we need to define three arrays here: the one that will hold the test results, the one that will hold the list of mirrors that timed-out or are not valid (if any) and the one that holds the list of our mirrors. The first two arrays are empty at this stage of the script.

my $testfile = "pclinuxos/2007/base/srclist.main.bz2"; # a scalar
my @timedList = (); # an empty array
my @badList = (); # another empty array
my @reposList = (
  "http://ftp.riken.go.jp/pub/Linux/pclinuxos/apt/",
  "http://ftp.kddlabs.co.jp/Linux/packages/pclinuxos/apt/",
  "http://ftp.jaist.ac.jp/pub/Linux/PCLinuxOS/apt/",
  "http://mirror.pclinuxclub.com/pclinuxos/apt/",
  "http://www2.mirror.in.th./osarchive/pclinuxos/pclinuxos/apt/",
  "http://ftp.twaren.net/Linux/PCLinuxOS/apt/",
  "http://gnupg.cdpa.nsysu.edu.tw/Linux/PCLinuxOS/apt/",
  "http://mirror.internode.on.net/pub/pclinuxos/apt/",
  "ftp://mirror.aarnet.edu.au/pub/pclinuxos/apt/",
  "http://na.mirror.garr.it/mirrors/pclinuxos/apt/",
  "http://ftp.ch.debian.org/mirror/pclinuxos/apt/",
  "http://gnustep.ethz.ch/mirror/pclinuxos/apt/",
  "http://debian.ethz.ch/mirror/pclinuxos/apt/",
  "ftp://ftp.pbone.net/pub/pclinuxos/apt/",
  "http://ftp.klid.dk/ftp/pclinuxos/apt/",
  "http://mirrors.lastdot.org:1280/pclos/apt/",
  "http://mirrors.xservers.ro/pclinuxos/apt/",
  "http://ftp.heanet.ie/pub/pclinuxos/apt/",
  "http://ftp.belnet.be/mirror/pclinuxonline.com/apt/",
  "http://ftp.nl.freebsd.org/os/Linux/distr/texstar/pclinuxos/apt/",
  "http://ftp.sh.cvut.cz/MIRRORS/pclinuxos/apt/",
  "ftp://cesium.di.uminho.pt/pub/pclinuxos/apt/",
  "http://distrib-coffee.ipsl.jussieu.fr/pub/linux/pclinuxos/apt/",
  "http://ftp.cc.uoc.gr/mirrors/linux/pclinuxos/apt/",
  "http://ftp.leg.uct.ac.za/pub/linux/pclinuxos/apt/",
  "http://spout.ussg.indiana.edu/linux/pclinuxos/pclinuxos/apt/",
  "http://ftp.uwsg.indiana.edu/linux/pclinuxos/pclinuxos/apt/",
  "http://ftp.ussg.iu.edu/linux/pclinuxos/pclinuxos/apt/",
  "http://pclosusers.com/pclosfiles/",
  "http://distro.ibiblio.org/pub/linux/distributions/texstar/pclinuxos/apt/"
  ); # array containing our mirrors URLs

Now, we print to the terminal that we are downloading our file from each mirror and add a new empty line after our message, so that it looks nice in terminal:

print "Downloading srclist.main.bz2 from each mirror... \n";

The \n means the n is "escaped", thus it has a special meaning to Perl, i.e. add a new line.

Let's run our test for each and every mirror in our @reposList. For that we use a foreach loop — the name says it all, it runs the same code block for each element in our array:

foreach my $url (@reposList) {
  # foreach loop: for each element execute code block

In our code block, we will recreate the complete URL for our test file (by concatenating each element of @reposList and $testfile), we'll fire up a web user agent:

my $link = "$url$testfile";
   # concatenate: recreate original link
my $ua = LWP::UserAgent->new();
   # fire up new web user agent

and we instruct it to test whether the mirror times out, show progress during download and time the download of the test file:

$ua->timeout(3);
   # set timeout interval
$ua->show_progress('TRUE');
   # turn on progress indicator
my $t0 = gettimeofday();
   # get time right before download
my $response = $ua->get($link);
   # download test file
my $t1 = gettimeofday();
   # get time right after download

Now that we got our result, we're going to use some programming magic called "regex":

$url =~ s|(\w+://.*?)(:\d+)?/(.*)|$1|;
   # use regex to format URL

We used regular expressions to alter the URL string. We want to show the user something readable and easy to figure out; therefore we want to keep only the server name and disregard the port number (if any) and the rest of the path. I won't get into details as explaining regexes is beyond the scope of this article.

If the response is successful, we'll add the mirror and the time result to our @timedList. If not, we'll add it to our @badList. For that we use a conditional block and Perl's push() function — as the name says, it pushes a value onto the end of an array.

if ($response->is_success) {
       # conditional block: if condition...
    my $delta = ($t1 - $t0);
       # calculate download time
    my $line = $url.' '.$delta;
       # line up URL and corresponding time
    push(@timedList, $line);
       # add line to the timed mirrors list
} else {
       # end of if block, start of else block: if not condition...
    push(@badList, $url);
       # add URL to the bad list
}      # end of else block
}      # end of foreach loop

As you can see, we declared some of our variables inside the foreach loop — the reason is that they are only used inside that loop.

After the foreach block executes for all mirrors, it's time to sort the results and present them to the user. We'll use a special "technique" to sort the results — the Schwartzian Transform (ST) — a method for sorting a data structure on an arbitrary key efficiently and with a minimum amount of Perl code. We map the original list into a list of references to lists containing the original and transformed values. The list of references will be sorted and then mapped back into a plain list containing the original values. Clear as mud?

my @transformList =map { [ $_, (split)[1] ] } @timedList;
   # transform: value, sortkey
my @tempSort =sort { $a->[1] <=> $b->[1] } @transformList;
   # sort
my @sortedList = map { $_->[0] } @tempSort ;
   # restore original values

Like most Perl syntax, the join, map, sort, and split functions work right-to-left so we can make that even shorter:

my @sortedList =
   # remember, we read this right-to-left...
map { $_->[0] }
   # restore original values
sort { $a->[1] <=> $b->[1] }
   # sort
map { [$_, (split)[1] ] } @timedList;
   # transform: value, sortkey

OK, we're almost there. We print out another message:

print "\nMirrors speed (time to get 400 KB):\n";

then we print the list with our results:

foreach (@sortedList) {
   # loop: for each element...
   print "$_", "s\n";
}  # end of foreach loop

Wait a second ... what is that $_? That is the Default Input and Pattern Searching Variable. If you don't specify a variable to put each element into, $_ is used instead. We used the above instead something like:

foreach my $item (@sortedList) {
   # loop: for each element...
   print "$item", "s\n";
}  # end of foreach loop

which would have yielded the same result.

We also want to tell the user whether there were any mirrors that, for some reason, didn't respond:

if (@badList) {
   # conditional block
   print "\nThe following mirrors timed out or are not valid :\n";
   foreach (@badList) {
   # loop: for each element...
      print "$_", "\n";
   }  # end of foreach loop
}     # end of conditional block

This is it folks ! You can download the whole script here: http://linuxgator.org/forums/viewtopic.php?f=15&t=1719. Or, you can enter the entire script (below), and save it to your computer. To do that, you should create a new empty file (document that is). Name it as you wish, let's say repotest.pl... the extension is not mandatory, but it's a good habit to use extensions, it gives you an idea on what that file deals with... You can copy/paste my code inside, no problem, as perl doesn't care much about indentation. So, it should be pretty straightforward (you can always consult the one on the forum if anything goes wrong). When you're done, save the file and make it executable. You can then open a terminal in the containing directory and run:

./repotest.pl

to see what it does. If you're planning on using it often, then you should consider removing the extension and placing the script in your ~/bin (that is, the folder bin inside your home). Then, you can use it by simply opening a terminal (from any location) and running

repotest

I hope this article opens your appetite for learning Perl... you will find out that it truly is "the Swiss Army chainsaw of scripting languages."

Yours,

Don

The entire script:

#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday);
use LWP::UserAgent;
my $testfile = "pclinuxos/2007/base/srclist.main.bz2";
my @timedList = ();
my @badList = ();
my @reposList = (
   "http://ftp.riken.go.jp/pub/Linux/pclinuxos/apt/",
   "http://ftp.kddlabs.co.jp/Linux/packages/pclinuxos/apt/",
   "http://ftp.jaist.ac.jp/pub/Linux/PCLinuxOS/apt/",
   "http://mirror.pclinuxclub.com/pclinuxos/apt/",
   "http://www2.mirror.in.th./osarchive/pclinuxos/pclinuxos/apt/",
   "http://ftp.twaren.net/Linux/PCLinuxOS/apt/",
   "http://gnupg.cdpa.nsysu.edu.tw/Linux/PCLinuxOS/apt/",
   "http://mirror.internode.on.net/pub/pclinuxos/apt/",
   "ftp://mirror.aarnet.edu.au/pub/pclinuxos/apt/",
   "http://na.mirror.garr.it/mirrors/pclinuxos/apt/",
   "http://ftp.ch.debian.org/mirror/pclinuxos/apt/",
   "http://gnustep.ethz.ch/mirror/pclinuxos/apt/",
   "http://debian.ethz.ch/mirror/pclinuxos/apt/",
   "ftp://ftp.pbone.net/pub/pclinuxos/apt/",
   "http://ftp.klid.dk/ftp/pclinuxos/apt/",
   "http://mirrors.lastdot.org:1280/pclos/apt/",
   "http://mirrors.xservers.ro/pclinuxos/apt/",
   "http://ftp.heanet.ie/pub/pclinuxos/apt/",
   "http://ftp.belnet.be/mirror/pclinuxonline.com/apt/",
   "http://ftp.nl.freebsd.org/os/Linux/distr/texstar/pclinuxos/apt/",
   "http://ftp.sh.cvut.cz/MIRRORS/pclinuxos/apt/",
   "ftp://cesium.di.uminho.pt/pub/pclinuxos/apt/",
   "http://distrib-coffee.ipsl.jussieu.fr/pub/linux/pclinuxos/apt/",
   "http://ftp.cc.uoc.gr/mirrors/linux/pclinuxos/apt/",
   "http://ftp.leg.uct.ac.za/pub/linux/pclinuxos/apt/",
   "http://spout.ussg.indiana.edu/linux/pclinuxos/pclinuxos/apt/",
   "http://ftp.uwsg.indiana.edu/linux/pclinuxos/pclinuxos/apt/",
   "http://ftp.ussg.iu.edu/linux/pclinuxos/pclinuxos/apt/",
   "http://pclosusers.com/pclosfiles/",
   "http://distro.ibiblio.org/pub/linux/distributions/texstar/pclinuxos/apt/"
);
print "Downloading srclist.main.bz2 from each mirror...\n";
foreach my $url (@reposList) {
   my $link = "$url$testfile";
   my $ua = LWP::UserAgent->new();
   $ua->timeout(3);
   $ua->show_progress('TRUE');
   my $t0 = gettimeofday();
   my $response = $ua->get($link);
   my $t1 = gettimeofday();
   $url =~ s|(\w+://.*?)(:\d+)?/(.*)|$1|;
   if ($response->is_success) {
       my $delta = ($t1 - $t0);
       my $line = $url.' '.$delta;
       push(@timedList, $line);
   } else {
   push(@badList, $url);
   }
}
my @sortedList =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, (split)[1] ] } @timedList;
print "\nMirrors speed (time to get 400 KB):\n";
foreach (@sortedList) {
   print "$_", "s\n";
}
if (@badList) {
   print "\nThe following mirrors timed out or are not valid :\n";
   foreach (@badList) {
      print "$_", "\n";
   }
}