Difference between revisions of "ISFDB Download Script"

From ISFDB
Jump to navigation Jump to search
(add mac os x info)
Line 5: Line 5:
 
It is especially useful for the huge cover files because it uses <code>wget</code> to download them, which means that it is able to resume interrupted downloads. If a download takes too long just press CTRL-C and restart the script later. The downloads will resume where they were interrupted without downloading everything again. Moreover, after the script finished downloading it is able to detect if there are newer files available the next time it's executed (by comparing the time stamp and size of your local copy and the one on the server). To sum it up: this script reduces traffic because it only starts downloads if necessary
 
It is especially useful for the huge cover files because it uses <code>wget</code> to download them, which means that it is able to resume interrupted downloads. If a download takes too long just press CTRL-C and restart the script later. The downloads will resume where they were interrupted without downloading everything again. Moreover, after the script finished downloading it is able to detect if there are newer files available the next time it's executed (by comparing the time stamp and size of your local copy and the one on the server). To sum it up: this script reduces traffic because it only starts downloads if necessary
  
This script might not work on all UNIX flavours, e.g. not on OSX, but that hasn't been tested so far (please update if you know more about that). It was developed on Xubuntu and should work on all similar platforms.
+
This script might not work on all UNIX flavours. It was developed on Xubuntu and should work on all similar platforms. It was also tested on Mac OS X, see the [[#Mac OS X notes|notes below]].
  
 
= Examples =
 
= Examples =
Line 22: Line 22:
  
 
  isfdb_download.sh -h
 
  isfdb_download.sh -h
 +
 +
= Mac OS X notes =
 +
 +
The script should also work fine on Mac OS X. It was tested on Mac OS X 10.9, aka Mavericks.
 +
 +
You need to be aware, though, that newer versions of Mac OS X no longer come with CVS pre-installed (at least that's true for Mac OS X 10.9 aka Mavericks). Moreover, CVS is also no longer included with Xcode 5! Therefore, if you want to get the source code, you need to install CVS from an external source. The best way is to use one of the 3 major package management systems. I recommend [http://brew.sh/ Homebrew], but [http://www.macports.org/ MacPorts] and [http://finkproject.org/ Fink] also have a CVS package. Patrick -- [[User:Herzbube|Herzbube]] <sup>[[User talk:Herzbube|Talk]]</sup> 18:03, 16 August 2014 (UTC)
  
 
= The download script =
 
= The download script =

Revision as of 14:03, 16 August 2014

Purpose

If you don't want to check the download page for newer files every now and then and don't want to click through all the links there manually, and if you're on an UNIX machine like Linux, you can use the shell script shown below to download the ISFDB files.

It is especially useful for the huge cover files because it uses wget to download them, which means that it is able to resume interrupted downloads. If a download takes too long just press CTRL-C and restart the script later. The downloads will resume where they were interrupted without downloading everything again. Moreover, after the script finished downloading it is able to detect if there are newer files available the next time it's executed (by comparing the time stamp and size of your local copy and the one on the server). To sum it up: this script reduces traffic because it only starts downloads if necessary

This script might not work on all UNIX flavours. It was developed on Xubuntu and should work on all similar platforms. It was also tested on Mac OS X, see the notes below.

Examples

Assuming you saved the script to a file called "isfdb_download.sh", here are some examples of how to call it.

Most simple case: download covers, latest database dump and get latest source code:

isfdb_download.sh /home/username/backups/isfdb

If you're not interested in source code and the huge cover files, ignore them by using options:

isfdb_download.sh -s -c /home/username/backups/isfdb

Print all available options:

isfdb_download.sh -h

Mac OS X notes

The script should also work fine on Mac OS X. It was tested on Mac OS X 10.9, aka Mavericks.

You need to be aware, though, that newer versions of Mac OS X no longer come with CVS pre-installed (at least that's true for Mac OS X 10.9 aka Mavericks). Moreover, CVS is also no longer included with Xcode 5! Therefore, if you want to get the source code, you need to install CVS from an external source. The best way is to use one of the 3 major package management systems. I recommend Homebrew, but MacPorts and Fink also have a CVS package. Patrick -- Herzbube Talk 18:03, 16 August 2014 (UTC)

The download script

Copy and paste the code into a text editor, save it and make the file executable:

#!/bin/sh

# This scipt downloads the latest database backup file and all covers listed on the ISFDB
# downloads page as well as the latest source code from the source code repository. Subsequent
# calls of  this script will only download newer files. The script can be interrupted pressing
# CTRL-C and is capable of continuing partial downloads when called the next time.

# You can optionally ignore certain downloads, see code below or call this script using the
# "-h" option for more info.

# The available cover and database files are identified by examining the ISFDB downloads page 
# and extracting the file links from it.

# The latest database file is simply identified by sorting all database file URLs, assuming
# that after sorting the first URL is the latest one.

# These variables define the location of the download page in the ISFDB wiki and
# how this script expects the links to the backup files on that page to look like:
download_page_url="http://www.isfdb.org/wiki/index.php/ISFDB_Downloads"
backup_server_url="http://isfdb.s3.amazonaws.com"
mysql_file_pattern="backups\/backup-MySQL-55-[^\"]*"
cover_file_pattern="images/images-[^\"]*"

usage() 
{
  echo "$(basename "$0") [OPTIONS] DOWNLOAD_DIRECTORY"
  echo "Valid options are:"
  echo "  -c | --ignore-covers : ignore cover files"
  echo "  -d | --ignore-database : ignore database file"
  echo "  -s | --ignore-sources : ignore source code"
  echo "  -h | --help : this message"
}

ignore_sources=
ignore_database=
ignore_covers=

while [ "$1" != "" ]; do
    case $1 in
        -s | --ignore-sources )    ignore_sources=true;;
        -d | --ignore-database )   ignore_database=true;;
        -c | --ignore-covers )     ignore_covers=true;;
        -h | --help )              usage
                                   exit;;
        -* )                       echo "Unkown option $1"
                                   usage
                                   exit 1;;
        *)                         download_dir="$1";;                                   
    esac
    shift
done

if [ -n "$download_dir" ]; then
  mkdir -p "$download_dir"
  if [ ! -w "$download_dir" ]; then
    echo "ERROR: Backup directory '$download_dir' couldn't be created or is not writeable!"
    usage
    exit 1
  fi
else
  echo "ERROR: No backup directory provided!"
  usage
  exit 1
fi

sources_dir="$download_dir/sources"
database_dir="$download_dir/database"
covers_dir="$download_dir/covers"

mkdir -p "$sources_dir"
mkdir -p "$database_dir"
mkdir -p "$covers_dir"

download_page="$download_dir/isfdb_download_page.html"

# Escape special characters in the URL so it can be used as a pattern for regular expressions:
backup_server_url_safe_regexp_pattern=$(printf '%s' "$backup_server_url" | sed 's/[[\.*/]/\\&/g; s/$$/\\&/; s/^^/\\&/')

errors=

echo
echo "******************************************"
echo "        Get and check download page"
echo "******************************************"
echo
if [ -e "$download_page" ]; then
  # Download the page only if it has been changed since the last download (using timestamp
  # comparison):
  curl_cmd="curl -z $download_page -o $download_page $download_page_url"
else
  curl_cmd="curl -o $download_page $download_page_url"
fi
if ! $($curl_cmd) ; then
  echo "IFSDB download page $download_page_url could not"
  echo "be downloaded. Did the URL change probably?"
  exit 1
fi
backup_server_url_found=$(grep -oE "$backup_server_url_safe_regexp_pattern" $download_page | head -n 1)
if [ -z "$backup_server_url_found" ]; then
  echo "Server URL $backup_server_url not found"
  echo "in ISFDB download page. Did the download page change probably?"
  exit 1
fi

if [ -z $ignore_sources ]; then
  echo
  echo "******************************************"
  echo "     Check out or update source code"
  echo "******************************************"
  echo
  sources_module_name="isfdb2"
  if [ -e "$sources_dir/$sources_module_name/CVS/" ]; then
    cd  "$sources_dir/$sources_module_name"
    if ! cvs update -d -P ; then
      errors="${errors}\nCould not update sources from CVS"
    fi
  else
    cd "$sources_dir"
    echo "No working copy found, checking out a new one. Press RETURN at login prompt:"
    if ! cvs -d:pserver:anonymous@isfdb.cvs.sourceforge.net:/cvsroot/isfdb login ; then
      errors="${errors}\nCould not login to CVS server."
    else
      if ! cvs -z3 -d:pserver:anonymous@isfdb.cvs.sourceforge.net:/cvsroot/isfdb co -P "$sources_module_name" ; then
        errors="${errors}\nCould not check out sources from CVS."
      fi
    fi
  fi
else
  echo "Ignoring source code"
fi

if [ -z $ignore_database ]; then
  echo
  echo "******************************************"
  echo "           Get latest database"
  echo "******************************************"
  echo
  cd "$database_dir"
  database_url=$(grep -oE "$backup_server_url_safe_regexp_pattern\/$mysql_file_pattern" "$download_page" | uniq | sort -r | head -n 1)
  if ! wget -c -N "$database_url" ; then
    errors="${errors}\nCould not download database backup '$database_url'"
  fi
else
  echo "Ignoring database"
fi

if [ -z $ignore_covers ]; then
  echo
  echo "******************************************"
  echo "            Get latest covers"
  echo "******************************************"
  echo
  cd "$covers_dir"
  covers_file=/tmp/isfdb_download_covers
  grep -oE "$backup_server_url_safe_regexp_pattern\/$cover_file_pattern" "$download_page" | uniq | sort > "$covers_file"
  while read -r covers_url
  do
    if ! wget -c -N "${covers_url}" ; then
      errors="${errors}\nCould not download covers '$covers_url'"
    fi
  done < "$covers_file"
  rm "$covers_file"
else
  echo "Ignoring covers"
fi

if [ -n "$errors" ]; then
  echo
  echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
  echo "     THERE WERE ERRORS"
  echo "!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
  echo
  printf "%b\n" "$errors\n"
else
  echo "Done."
fi