Scala 2/4: Loops, Decisions and Tests

This is the second article in a series, to read the whole series go here.

This week I’ve worked on learning Scala’s basic syntactic structures and how to write unit tests. To accomplish this I’ve implemented a solution to the fizzbuzz problem and written a couple of unit tests to demonstrate that it works.

For an overview of the fizzbuzz problem, go to wikipedia’s page.

Conditionals, Loops and Other Constructs

Fizzbuzz.scala

Defining Methods

object Fizzbuzz {
  // Returns true if the given number is divisible by 3
  def isfizz(num : Int) : Boolean = { num % 3 == 0 }

  // Returns true if the given number is divisible by 5
  def isbuzz(num : Int) : Boolean = { num % 5 == 0 }

The method definition syntax is in the format: def <method name>(<arg name> : <arg type>) : <result type> = <body>. You can see here that there is no ‘return’ statement in Scala. The value of any given block is the last statement executed in that block. This can lead to some confusion if you have very large methods with a lot of complex branching. The obvious solution is not to have very large methods. I generally start breaking up my methods if they start getting to be more than 12 lines long.

Scala is a statically typed language, which means that all of the variable/argument types are set at compile time. Fortunately it also uses Hindley-Milner type inference which will often guess the intended type of an argument/variable. Because of this the above code can be simplified as:

object Fizzbuzz {
  // Returns true if the given number is divisible by 3
  def isfizz(num : Int) = num % 3 == 0

  // Returns true if the given number is divisible by 5
  def isbuzz(num : Int) = num % 5 == 0

As far as I’m able to tell, unlike in some systems that use Hindley-Milner, types are required when defining the arguments for a method. I don’t really view this as a disadvantage, when I’m coding I get hopelessly confused if I don’t have the intended type of my variables noted somewhere, there’s no reason not to have it in a way that the compiler can catch my mistakes.

You can also see that when there’s only one statement in the method there’s no need for the curly braces around it. This can lead to significantly cleaner looking code when you just have a one liner.

If Statements and Exploring Your Options

   def fizzbuzzSingle(num : Int) : Option[String] = { 
    if(num < = 0) None
    else if(isfizz(num) && isbuzz(num)) Some("fizzbuzz")
    else if(isfizz(num)) Some("fizz")
    else if(isbuzz(num)) Some("buzz")
    else Some(num.toString())
  }

If Statements

One of the ways the Scala team tried to make their language more accessible is to keep the syntax similar to Java where they didn't either simplify it or add functionality which needed new syntax. So anyone who is familiar with any of the Algol type programming languages will recognize the format of the conditional statements above.

Algebraic Data Types

What is less common is Scala's use of an algebraic type system. In short this means that there are types that wrap other types. In the above case the Option type can wrap an arbitrary types. I've used it to wrap a string, bit it could just as easily wrap an Int or a Window object. There are a myriad of reasons for using an algebraic type system. Here it allows me to indicate that an input is invalid (values less than 1) without throwing an exception.

When you are using a package that communicates failure through exceptions, it's often hard to know what exceptions to expect. By using the type system, if you know the type signature of your function/method, you can be sure what to expect as output from your function. When an invalid value is handed to a function using the Option type, it returns None. When it receives a valid value then it returns Some(<response>).

Strictly speaking Scala doesn't have Algebraic types, here for example None and Some are subclasses of the abstract class Option. But for practical purposes it's hard to tell the difference.

Comprehensions and Match

   def fizzbuzzList(end : Int) : List[String] = {
    for (num < - List.range(1, end + 1)) yield fizzbuzzSingle(num) match {
      case Some(n) => n
      case None => ""
    }
  }

For Comprehensions

For comprehensions are similar to list comprehensions in other languages with the syntax: for <value> <- <list> if <boolean statement> yield <result>. This will go through each of the values in <list> and if the <boolean statement> returns true then yields the given <result>. The if <boolean statement> part is optional as in the above code.

Match/Case Statements

Scala's match/case statements are a generalization of the more common switch statement. The difference between them are that checking isn't just at a value level, but at a type level also. Here I'm using it to distinguish between the Some and None type, then I can extract the variable value in Some(n) and make use of it.

main

   def fizzbuzzString(end : Int) : String = fizzbuzzList(end).mkString(" ")

  def main(args : Array[String]) = println(fizzbuzzString(20))
} 

When a file contains an object with a main method, it will execute that method when it's associated class is called from Java.

Testing in Scala

TestFizzbuzz.scala

There are a number of unit test frameworks available for Scala. The one that comes with the system, SUnit, has been deprecated. While there isn't a replacement in the standard libraries, the most common recommendations are:

  • ScalaTest: A standard unit testing framework.
  • scalacheck: A test generation framework in the same style as Haskell's quickcheck.
  • specs: A Behavior-Driven-Design style of test framework.

For my purposes I chose scalacheck because it is well suited to testing the fizzbuzz problem, and I'm rather fond of quickcheck. In scalacheck you write assertions about the function that is tested instead of writing explicit test cases. Scalacheck then generates random data to verify those assertions.

Installing the Library

To install scalacheck, download the jar file (scalacheck-..-.*.jar) from scalacheck's download page and save it to Scala's lib directory, c:\scala\lib if you're continuing from last week. 

Importing Libraries

import org.scalacheck._
import Fizzbuzz._

Import in Scala looks very similar to Import in Java except instead of using *, you use _ to indicate a wildcard.

Subclassing and Local Imports

object TestFizzbuzz extends Properties("fizzbuzzSingle") {
  import Prop._

Scala's subclassing and trait system is extraordinarily complex, but just subclassing from one other class is simple: object <object name> extends <super class>

If you only need to use an import in context to a single class or object, then you can import it inside that class or object.

A scalacheck Properties

property("fizzbuzzSingle: < 1 is None") = forAll 
  { (num: Int) => (num < 1) ==> (Fizzbuzz.fizzbuzzSingle(num) == None) }

  property("fizzbuzzSingle: >= 1 is not None") = forAll 
  { (num: Int) => (num >= 1) ==> (Fizzbuzz.fizzbuzzSingle(num) != None) }

  property("fizzbuzzSingle: (mod 3) and not (mod 5) is Some(\"fizz\")") = 
    forAll (Gen.posNum[Int]) 
      { (num: Int) => ((num % 3 == 0) && (num % 5 != 0)) ==> 
        (Fizzbuzz.fizzbuzzSingle(num) == Some("fizz")) }
}

Scalacheck has a lot of depth to it, but getting started is simple. A simple template is:

property("<property description>") = forAll { (<argument> : <type>) => (<limiting condition>) ==> (<assertion>) }

Working backwards, the assertion is a boolean statement that should be true so long as the limiting condition is true. The argument/type combination are the values that you want the system to generate randomly. As in a normal function definition, you can have as many argument/type pairs as you like, separated by commas.

My Experience

Overall I've found working out the fizzbuzz problem in Scala to be interesting and, the language didn't spend much time getting in the way. The two major caveates to that are:

  1. The state of flux of the unit test system, it'd be nice if they'd pick one to include in the package.
  2. When the code is compiled, there are an unusually large number of class files because of the anonymous functions that tend to crop up in functional code (14 if you include the test code).

Resources

Scala 1/4: Getting Started

This is the first article in a series, to read the whole series go here.

Scala

“The work on Scala stems from a research effort to develop better language support for component software.” -Scala Overview

Scala is:

  • Object Oriented
  • Functional
  • Strongly Typed
  • A Research Language
  • JVM Based

Why Scala is Interesting

The overall theme of Scala seems to be the same as the theme with any building toy: make it possible to create interesting things from relatively simple things. While studying it I’ve been repeatedly surprised by finding language constructs that are not from the core language, but can be created from it. For example, Erlang is well known for its actor model for concurrency. With Scala it’s possible to create such a system in libraries.

Purely Object Oriented… and Functional?

Scala is both object oriented and functional. My first reaction was that this is a contradiction in terms. The minimum requirement for a language to be functional is that all functions are first class objects. While there are a lot of ideas attached to object oriented languages, at the core an object oriented language has two basic attributes:

  1. Every element in the language is an object.
  2. Every action is performed by a method call from one of those objects.

Scala resolves this apparent contradiction by making functions objects in the object oriented sense. Each function is a singleton object with the single method apply which is executed when the function is called.

Powerful Pattern Matching

Most programming languages have some variant on:

(pseudo code)
switch val:
  "first" -> res1
  "second" -> res2

Which is syntactic sugar for:

(pseudo code)
if val == "first":
  res1
elif val == "second":
  res2

Scala’s match functionality is a generalization of this pattern that applies to classes as well as values, for example if you first create a couple of case classes:

abstract class Tree
case class Node(left: Tree, right: Tree) extends Tree
case class Leaf(num: Int) extends Tree

You can then create a function that will sum the tree:

def sumTree(tree: Tree): Int = tree match {
  case Node(l, r) => sumTree(l) + sumTree(r)
  case Leaf(n)    => n
}

This will look familiar to those who’ve coded in languages with Algebraic Type systems like Haskell. This is a case of Scala taking two relatively common elements, classes and switch statements, and combining them into something that is more powerful.

Access to Java’s Libraries

One of the major problems with any programming language is having sufficient libraries to be useful for real tasks. For a language to have these on its own it must either have a large community to produce the libraries over time or an exceptionally dedicated small community. Scala solves this problem by having transparent access to Java’s libraries, and thus the work of Java’s very large community. Any Java library can be imported into a Scala program as though it were one of Scala’s native libraries. Strictly speaking since Scala compiles to Java bytecode, it is.

Scala Installation

Before you install Scala, you’ll most likely want to make sure you have Java’s JDK installed. You can pick it up here.

Scala doesn’t have a true installer, so you have to do a bit of manual work to get started:

  1. Download the zip file from Scala Downloads.
  2. Unzip the file into your C:\ directory.
  3. Renaming the resulting directory to something like “scala” may make your life easier if you have to upgrade later.
  4. Add “C:\scala\bin” to your path in environment variables.
  5. If you have Java installed and you want to be able to run Scala programs using Java, add “C:\scala\lib*” to your CLASSPATH environment variable.

Hello World

Scala as Script Interpreter

Scala has a double life, it’s both a compiler for Java bytecode and a scripting language. When it’s acting as a scripting language the rules are a little more flexible. Compiled, Scala’s syntax requires that everything be contained in an object or class. When run through its interpreter you can just have sequences of commands. For example:

HelloWorld1.scala:

println("Hello World!")

You can run this program from the command line with: scala HelloWorld1.scala

Scala as Compiler

HelloWorld2.scala:

object HelloWorld {
  def main(args: Array[String]) {
    println("Hello World!")
  }
}

This version of hello world can be run by first compiling it with scalac HelloWorld2.scala then running it in Java’s VM with java HelloWorld or in Scala with scala HelloWorld. As with Java, by default the name of the produced class comes from the main class, or here singleton object.

Scala’s Interactive Shell

Interactive shell has to be one of my favorite language feature and Scala’s is rather nice. To start it up type scala at your command line. From there, you can get hello world by typing println("Hello World").

A Couple of Nice Features

When you type something into Scala’s interactive shell you’ll notice that the interaction looks something like this:

scala> 3 + 4
res0: Int = 7

You can then use res0 as a variable. Additionally if you type :power you’ll get access to several nice exploratory features of the language, for example, type in 7. then tab to get all the methods that apply to an integer.

Trivia

  • Scala stands for “scalable language”.
  • Scala was originally designed to be interoperable with both Java and .Net.
  • The leader of the Scala team, Martin Odersky, also helped develop Java Generics.

References:

Markdown

What’s it for?

Markdown is a lightweight markup language which allows its user to create documents in a variety of formats. Unlike many other markup languages like HTML and LaTeX, readability of the sourcecode was one of the top priorities in Markdown’s design. Its creator, John Gruber, based its syntax on how we format plain text emails to make it as readable as possible. For example: to emphasize text, you use “*important*” to get “important“.

For people who are accustomed to working in plain text, this is an excellent tool to create documents others will want to read while still using text editors and other tools that they are accustomed to.

Where to get it

Markdown is only a syntax specification for which there are several implementations. For most purposes I’ve found Pandoc to be excellent. Pandoc reads a superset of Markdown and writes out in a multitude of formats including HTML, PDF(with pdflatex) and Docbook.

How to Write with Markdown

Here I’ll give 4 commonly used formatting as an example. These should be be sufficient for many simple documents:

  • Section Headers
  • Links
  • Lists
  • Code

Section Headers

There are two ways to denote Section Headers which can be mixed freely. The two highest level headers of can be marked by underlining as in:

Markdown:

First-level heading
===================

Second-level heading
--------------------

Renders as:

First-level heading

Second-level heading

All levels can be marked using # marks preceding the text:

Markdown:

# First-level heading

## Second-level heading

### Third-level heading

Renders as:

First-level heading

Second-level heading

Third-level heading

The maximum number of levels depends on the format that it’s converted to. As an example, HTML allows for 6 levels.

Links

Links are written in the format [<link name>](<url> "<title>") where the title is optional. For example, [Python](http://python.org "Python's web site") renders as: Python

Lists

Lists may either be bulleted or numbered:

Bulleted List Markdown:

   * First
  * Second
  * Third

Renders as:

  • First
  • Second
  • Third

Numbered List Markdown:

   1. First
  2. Second
  3. Third

Renders as:

  1. First
  2. Second
  3. Third

Code

Code can be entered inline by surrounding it with back quotes, `like this` or by indenting by 4 or more spaces:

Markdown:

    if inCodeBlock:
      return True

Renders as:

if inCodeBlock:
  return True

Generating Documents from Markdown Using Pandoc

Once you’ve downloaded Pandoc from its site and installed it, you can generate a variety of documents from your Markdown text. To see the complete list type “pandoc --help” at the command line.

To generate HTML documents at the command line use “pandoc <markdown document>.mkd -o <html document>.html“. Pandoc determines what the format of the input and output types are from their extensions.

Final Words

Markdown is a readable markup language that covers a significant proportion what you need to generate formatted documents. Unfortunately it does lack some features. There is no notation for tables, footnotes and several other fairly common features in documents available in the original Markdown standard. To compensate for this, a variety of supersets have been create. Pandoc’s extensions, Markdown Extra and MultiMarkdown are all commonly used.

If you want to play with Markdown a bit to get your feet wet, this blog is also written using Markdown and the comments section supports it. Feel free to experiment.

Resources

Python 4/4: GUI Libraries

This is the last article in a series, to read the entire series go here.

Writing a GUI for our RSS Aggregator

The Plan

I’ll be finishing up the Python set with an overview of the Tkinter GUI library. There are numerous other GUI libraries available for Python, the best known of which is wxPython. I chose Tkinter because it comes built into Python and the primary goal here is to, as much as possible, look at Python as it comes out of the box.

What to Expect

When this article is complete:

  • You will have:
    • A basic GUI for last week’s RSS reader.
  • You will know:
    • How to write a GUI in Python’s Tkinter.
    • How to use lambda expressions to create functions on the fly.

The final GUI will look like this:

RSS Gui

Files Used in this Project

  • RssReader.txt: A library for reading RSS information, that I wrote in the previous project. Rename it to “RssReader.py” after downloading.
  • RssReaderGui.txt: The sourcecode for this project. Rename it to “RssReaderGui.py” after downloading.
  • feeds.txt: A sample input file.

The Code

Libraries

from RssReader import generateRsses
from Tkinter import *
from tkFileDialog import askopenfilename
  • Tkinter: Python’s built in GUI library.
  • tkFileDialog: A standard dialog for selecting files.

Global Variables

currentFeeds = {}
rssDisplay = None
chooseChannel = None
currentChannel = None
currentFeeds = None

I’ve seen two ways to organizing the code for a simple one window GUI in Python. One is to wrap it all in a single large class. The other is to write a group of functions that share a set of global variables. In this case a single class would have the same problems as globals along with the overhead of adding self. to every method call and class variable.

The variables that are set to None above are place holders for global variables that will be assigned later. They don’t have to be assigned here, but it’s useful to know what your globals are at the front end.

Toplevel Window Creation

def rssWindow():
  root = Tk()
  root.title("Rss Reader")
  root.geometry("750x500")
  root = channelFrame(root)
  root = buttonFrame(root)
  return root

Although you can define your functions in arbitrary order, I’ve written these in a, more or less, heirarchical order. I’ll walk through what I did here line by line:

  1. At the top level we create a Tk object.
  2. Give the window a title
  3. and an initial size.
  4. The frame that contains the drop down menu and list of articles.
  5. The frame that contains the select and load buttons.

The Upper Window with the Dropdown and Listbox Layout

RSS Gui

def channelFrame(parent):
  channelFrame = Frame(parent)
  channelSelectFrame = Frame(channelFrame)
  channelSelectFrame = channelLabel(channelSelectFrame)
  channelSelectFrame = channelSelect(channelSelectFrame)
  channelSelectFrame.pack(side=LEFT)
  channelFrame = rssDisplay(channelFrame)
  channelFrame.pack(side=TOP, expand=YES, fill=BOTH)
  return parent 

Here we create the window that contains textual information from the RSS feeds. To accomplish this, I create frames which contain either other frames or GUI elements. Once the elements are created and organized by frame they are in, then the window behavior is set with .pack

def rssDisplay(parent):
  global rssDisplay
  rssDisplay = Listbox(parent)
  rssDisplay.pack(side=RIGHT, expand=YES, fill=BOTH)
  return parent

def channelLabel(parent):
  label = Label(parent, text="Select a channel:")
  label.pack(side=TOP)
  return parent

def channelSelect(parent):
  global chooseChannel, currentChannel
  currentChannel = StringVar(parent)
  channelList = ["None"]
  currentChannel.set(channelList[0])
  chooseChannel = OptionMenu(parent, currentChannel, *channelList)
  chooseChannel.pack(side=BOTTOM)
  return parent

Here the individual components are generated, the behaviors of each are described later. On line 2 I used the global keyword. You only need to use this keyword if you plan on changing the global variable, not if you’re going to access it.

The Lower Window with the ‘Set Config File’ and ‘Load Feeds’ Buttons

RSS Gui

def buttonFrame(parent):
  buttonFrame = Frame(parent)
  buttonFrame = setConfigFileButton(buttonFrame)
  buttonFrame = loadRssButton(buttonFrame)
  buttonFrame.pack(side=RIGHT, expand=YES, fill=X)
  return parent

def setConfigFileButton(parent):
  button = Button(parent, command=setConfigFile)
  button["text"] = "Set Config File"
  button.pack(side=LEFT)
  return parent

def loadRssButton(parent):
  button = Button(parent, command=loadRss)
  button["text"] = "Load Feeds"
  button.pack(side=RIGHT)
  return parent

Here I create a top level layout for each of the buttons then describe the specific behavior of each one. We describe the buttons behavior by passing the function to be called via the command argument.

Commands called by various elements

Lambda, functions are first class objects

def loadRss():
  global chooseChannel, currentChannel
  chooseChannel["menu"].delete(0, END)
  channelList = currentFeeds.keys()
  for channelName in channelList:
    chooseChannel["menu"].add_command(label=channelName, 
        command=lambda (temp = channelName): selectChannel(temp))
  selectChannel(channelList[0])

The loadRss function clears the values from the drop down menu then adds each of the current channels to it. It also uses a Lambda function to set the behavior of the drop down box when a given channel is selected.

Lambda functions, as used in line 7 of the above code are extraordinarily useful. It comes in handy, as in the above case, when you need to create a function for which some of the internal values are not known until runtime. Here the Lambda function’s purpose is to assign the default value, channelName to the selectChannel function and assigns it as chooseChannel‘s command.

In general Lambda functions come in the form lambda arg, arg: arg + arg. For people who aren’t used to functional it’s important to remember that there’s only ever one line in a lambda function and the value of that line is always returned without any need to use the return keyword.

def selectChannel(channelName):
  global chooseChannel, rssDisplay
  chooseChannel.setvar(chooseChannel.cget("textvariable"), value = channelName)
  rssDisplay.delete(0, END)
  for feed in currentFeeds[channelName]:
    rssDisplay.insert(END, feed)

When one of the channels is selected from the drop down, this function fires and populates the display window with the names of the current articles.

def setConfigFile():
  global currentFeeds
  currentFeedFile = askopenfilename(filetypes=[("allfiles", "*"), 
                                                ("textfiles","*.txt")])
  currentFeeds = combineFeeds(currentFeedFile)
  loadRss()

Here the program opens a file dialog with askopenfilename which will return the flie selected. It then combines all of the feeds with the same channel name and loads them into the select box to display them with loadRss.

def combineFeeds(fileName):
  feeds = {}
  for feed in generateRsses(fileName):
    for channelName in feed.keys():
      if feeds.has_key(channelName):
        feeds[channelName] += feed[channelName]
      else:
        feeds[channelName] = feed[channelName]
  return feeds

combineFeeds reads the RSS sources from the given filename, downloads the feeds using the library from the last article then combines any channels that happen to have the same name.

if __name__ == "__main__":
  rssWindow().mainloop()

To actually create the window and make it useful, the program calls rssWindow(), described at top of the file, then runs the mainloop() method on it.

Final Summary

I’ve thoroughly enjoyed touring through Python, my observations are that Python has: * A gentle learning curve. * Many libraries with good documentation. * An easy to read syntax. * Flexible semantics.

This has been an interesting run, next week I’ll be taking a look at Markdown, a lightweight markup language which allows you to create nicely typset documents from the comfort of your text editor. Starting in December I’ll be taking a look through Scala a functional/object oriented language that runs in the Java VM.

Resources

Python 3/4: Context Managers, XML and Networking

This is the third article in a series, to read the entire series go here.

Writing an RSS Aggregator with Python

For this article I’ll be putting together a simple RSS aggregator. This will shed a some light on Python’s XML and web libraries and introduce a way to minimize boilerplate code with context managers.

The final product will read through a file containing a list of RSS feeds and print the names of the channels and the articles they contain to screen.

To use it, type: python RssReader.py feeds.txt

Files Used in this Project

  • RssReader.txt: The sourcecode for this project, rename it to “RssReader.py” after downloading.
  • feeds.txt: A sample input file.

Libraries

import sys
import urllib2
import xml.sax
import xml.sax.handler
  • sys: A library for accessing information passed into or maintained by Python’s interpreter.
  • urllib2: A library for manipulating URL’s and opening HTTP streams.
  • xml.sax: Read from and write to XML files using SAX.
    • Tutorial: A good overview of parsing XML using the SAX library.

Parsing XML

The RSS feed parser extracts the names of the channels and the titles of each of the articles.

The handler below gets most of its functionality from its parent class. There are four methods added: startDocument, startElement, endElement and characters. These each fire as an XML document is being read. Because the methods themselves are generic enough to apply to any XML document, you need a lot of if/else statements inside to deal with specific kinds of XML document.

class RssHandler(xml.sax.handler.ContentHandler):
  def startDocument(self):
    self.inItem = False
    self.inTitle = False
    self.channels = {}
    self.currentChannel = None
    self.str = ""

def startElement(self, name, attrs):
    lname = name.lower()

    if lname == "item":
      self.inItem = True
    elif lname == "title":
      self.inTitle = True

def endElement(self, name):
    lname = name.lower()

    if lname == "item":
      self.inItem = False
    elif lname == "title":
      self.inTitle = False
      if self.inItem:
        self.channels[self.currentChannel] += [self.str]
      else:
        self.currentChannel = self.str
        if self.currentChannel not in self.channels.keys():
          self.channels[self.currentChannel] = []
      self.str = ""

  def characters(self, content):
    if self.inTitle:
      self.str += content

This class reads through an XML document and accumulates a relational list of the form: {"Channel 1":["Article 1", "Article 2"], "Channel 2":["Article 3"]}

Context Managers and Downloading Webpages

Context Managers

Files, network connections and other information that require streams of data generally require some amount of boilerplate to open and close those streams. This both adds uninteresting noise to your code and can cause unexpected bugs when you open a stream but fail to close it. Python deals with this through context managers.

Without context managers printing the lines of a file to screen looks like this:

file = open("infile.txt", "r")
print file.read()
file.close()

Though this is clear and obvious in a small example, in real code there can be problems between opening and closing the file, which cause it not to be closed properly. The standard fix is to wrap this with a try/finally to deal with any exceptions that are thrown in the process:

file = open("infile.txt", "r")
try:
  print file.read()
finally:
  file.close()

Where finally makes certain that, regardless of what happens in the middle, the file gets closed. This is effective, but contains a fair amount of boilerplate for a simple action. Context managers allow us to do this:

with open("infile.txt", "r") as file:
  print file.read()

The open function returns a file object which contains the context manager. Any object can become a context manager object by adding an __enter__ and __exit__ method for creation and cleanup. Many of Python’s standard stream objects have these methods added already.

Downloading Webpages

Unfortunately the urlopen function from urllib2 does not come with a context manager, so I had to write one myself. Fortunately, it’s easy.

class Url():
  def __init__(self, url):
    self.url = url

  def __enter__(self):
    self.stream = urllib2.urlopen(self.url)
    return self.stream

  def __exit__(self, type, value, traceback):
    self.stream.close()

Tying it All Together

Using the XML Parser

def generateRsses(feedFile):
  with open(feedFile, "r") as file:
    urls = [url.strip() for url in file.readlines()]

  for url in urls:
    with Url(url) as rss:
      handler = RssHandler()
      parser = xml.sax.make_parser()
      parser.setContentHandler(handler)
      parser.parse(rss)
      yield handler.channels

def printFeed(rss):
  for channelName in rss.keys():
    print "*** " + channelName + " ***"
    for title in rss[channelName]:
      print "\t" + title

The most complex part of this is the actual parsing of the xml, I’ll walk you through it line by line.

  1. Create the RssHandler (the class created above). This defines how the parser will work.
  2. make_parser creates an object that does the XML heavy lifting.
  3. Attach the handler to the parser.
  4. Do the actual parsing.
  5. Extract the resulting data.

Commandline Arguments

if __name__ == "__main__":
  [scriptName,feedFileName] = sys.argv
  for rss in generateRsses(feedFileName):
      printFeed(rss)

The only new element here is sys.argv. sys.argv contains all of the arguments handed into the python interpreter. The first is the name of the script itself, so the program ignores it, the second is the name of the file that contains a list of RSS feeds.

Resources:

Python 2/4: Loops, Decisions, and Tests

This is the second article in a series, to read the entire series go here.

Conditionals and Loops

Now that Python’s installed and we know how to run a program, the next step is to look more closely at Python’s conditional structures. The lens we’ll use for this is the fizzbuzz problem.

fizzbuzz

fizzbuzz.py

The fizzbuzz problem is derived from a game in which the players sit in a circle and count, one after the other. If the number is divisible by 3 or 5, they say either ‘fizz’ or ‘buzz’ insteady of the number. If the number is divisible by both 3 and 5 they say ‘fizzbuzz.’ A fairly common test of basic programming skill is to write a program that will iterate through a sequence of numbers assigning them the appropriate value from the fizzbuzz game.

I’ll be using two different solutions to the fizzbuzz problem to take a look at how Python handles conditionals and loops. One solution will use a for loop, the other will use list comprehension.

Function Formatting

The first point to examine is function definition. In the previous post I skimmed over how blocks are delimited in Python. Here you can see that python uses the off-side rule to indicate the start and end of a block and new lines to indicate the end of a statement. This is wonderful because it makes for a very clean looking syntax, and it eliminates two entire classes of bugs: the ‘you forgot a semicolon here’ bug and the ‘remember to close your braces’ bug. It also creates its own problems.

Python interprets both a tab and a space as a single unit of whitespace. It reads 1 tab’s worth of spaces and an actual tab as being two different things. This can lead to a lot of frustration for new programmers because these look exactly the same to someone reading the code. To help avoid this kind of problem it’s considered standard in the Python community to set your editor to use only spaces and never insert a tab character. How to go about this varies from editor to editor. In IDLE, Python’s IDE, tabs are treated as spaces by default.

Most frequently in the line preceding a new block there is a line such as a function declaration, an if statement, a looping statement, those lines will have colon’s at their ends indicating the end of the statement/start of the block.

def isfizz(num):
  """Returns true if the given number is divisble by 3"""
  return num % 3 == 0

def isbuzz(num):
  """Returns true if the given number is divisble by 5"""
  return num % 5 == 0

The other point of interest in the above code is the first line of each of the functions after the declaration. When you write a python function if you start it with a triple quoted string documention generated by pydoc will automatically be incorporated in the function.

If I Were More Assertive

def fizzbuzzSingle(num):
  """This follows the fairly simple pattern of generating
     a single instance of fizz buzz depending on the number
     value given."""
  assert num > 0, "The given number " + str(num) + " is not a natural number."

  if isfizz(num) and isbuzz(num):
    return "fizzbuzz"
  elif isfizz(num):
    return "fizz"
  elif isbuzz(num):
    return "buzz"
  else:
    return str(num)

A few interesting pythonisms show up here. On line 5 there’s an assertion statement with the format “assert <a boolean statement>, <The string to throw if it fails>“. This is just syntactic sugar for:

    if not num > 0:
      raise AssertionException("The given number " + str(num) + "is not a natural number."

Lines 7-13 contain the standard setup for Python if/elif/else statements. Although Python usually uses parentheses to delimit arguments in a function, they are unnecessary in python conditionals/loops where the colon indicates the end of the statement. Of course that leads to the question, why they are necessary in function definitions. Anyone who can answer this feel free to let me know.

On line 14, there’s an explicit type conversion from an integer to a string. This is necessary in Python because implicit type casting isn’t allowed. The basic types all have an equivalent to this converstion, this means the programmer only has to learn one function per type, instead of “intToStr”, “floatToStr” etc.

Ranges and Method Calls

def fizzbuzzComprehension(topNum):
  """In this example I've used list comprehension to generate the list of
     fizzbuzz strings, then combined them afterwards."""
  fizzbuzzList = [fizzbuzzSingle(num) for num in range(1, topNum + 1)]
  return " ".join(fizzbuzzList)

I’ve used the range function several times, but haven’t gotten around to explaining how it works. It shows a few of the nice features in python. At its base, range generates an ordered list of numbers. What that list is depends on the number of arguments.

range(5) # 1 Argument Generates [1, 2, 3, 4]
range(3, 7) # 2 Arguments Generates [3, 4, 5, 6]
range(0, 30, 5) # 3 Arguments Generates [0, 5, 10, 15, 20, 25]

The important idea here is that Python allows variable numbers of arguments in its functions, we’ll get to more detail on how this works in a later article.

The last line in fizzbuzzComprehension introduces Python’s method call notation. The syntax is fairly common in object oriented languages <object>.<method>(<arguments>). Here the object is a string with one space, the method is “join.” Join takes a list of strings and intersperses its object between them.

For Loops, Iterables and Slices

def fizzbuzzFor(topNum):
  """Here I've used the more common for loop to generate the fizzbuzz string,
     It's less concise than the list comprehension version, but is also less
     suprising to those who aren't as familiar with the syntax."""
  fizzbuzzString = ""
  for num in range(1, topNum + 1):
    fizzbuzzString += fizzbuzzSingle(num) + " "

  # Return the total string, dropping the last space.
  return fizzbuzzString[:-1]

On line 6 we introduce Python’s version of the for loop. Unlike most languages I’ve encountered where the format is “for(<start value>, <end condition>, <value change>)“, Python’s format is “for <new variable> in <iterable>“. An iterable, broadly described, is any value that you could perceive as a list of something. Lists, lines in a file, the output of a generator and a sets are a few examples of iterables.

The last line will look especially unusual to those who are used to conventional array index notation. Python uses a rather power syntax called slice notation to refer to subsets of an iterable value where 0 is the first value. You can refer to sub-lists with the syntax “<a list>[<first value>:<value after the last value>]“. If you leave out either the start or end value then Python will take from the beginning or the end respectively. Finally, as can be seen in fizzbuzzFor, you can use negative numbers to refer to start from the end of the iterable rather than the beginning. To make the functionality more clear, read a few examples:

"spam"[0:1] == "s"
"spam"[:3] == "spa"
"spam"[-2:] == "am"
"spam"[:-1] == "spa"

Making it Run

if __name__ == "__main__":
  print fizzbuzzComprehension(20)

As I mentioned in the last article, Python doesn’t have a ‘main’ function, but there are times when you want to load a module without having it execute all of its code. If you want code that is only executed when the file is executed on its own you check if the the __name__ variable’s value is "__main__".

Testing…. Testing

test_fizzbuzz.py

Python comes with its unit test framework built in, among many other libraries. There are a number of others available, but pyunit (called unittest in the python 2.7 release) is more than adequate.

Importing Modules/Libraries

import unittest
from fizzbuzz import *

Both of the above commands import the named libraries. The two differ in how much information you need to reference the contents of their modules. The functions on fizzbuzz can be referenced as though they were declared in the same file. To refer to an element in the unittest module you must indicate that you are referring to a member of that module with dot notation like “<module>.<element>“. Unfortunately this notation is confusingly similar to the notation used to reference the methods in objects.

The notation “from <module> import <something>” can either be general and import all of the elements in the module as above, or import specific elements as in:

from fizzbuzz import fizzbuzzSingle, fizzbuzzComprehension

Classes

class FizzBuzz(unittest.TestCase):
  def setUp(self):
    """Setup is run every time a given test is executed."""
    self.justBy3 = [num for num in range(1, 101) if isfizz(num) and not isbuzz(num)]
    self.justBy5 = [num for num in range(1, 101) if isbuzz(num) and not isfizz(num)]
    self.by3and5 = [num for num in range(1, 101) if isfizz(num) and isbuzz(num)]
    self.notBy3and5 = [num for num in range(1, 101) if not (isfizz(num) or isbuzz(num))]

Class syntax in Python isn’t especially suprising, “class <name of class> (<superclasses>)“. It’s worth noting that Python allows for multiple inheritance and that methods are declared in a similar fashion to functions.

All methods have ‘self’ as their first argument, this allows you to refer to the object that contains the method without any extra keywords. This can cause some confusion because while all methods have ‘self’ as their first argument when declared, when the method is called ‘self’ is left off.

  def testFizzBy3(self):
    """Verifies that if a number is divisble by 3, it returns fizz"""
    for num in self.justBy3:
      result = fizzbuzzSingle(num)
      self.assertEqual(result, "fizz")

Pyunit makes use of the exception handling system for its tests. In this case, should the assertion in line 5 fail then the test fails. Also note in line 3, the use of the list generated in the setUp method.

  def testMustBePositive(self):
    """fizzbuzzSingle should throw an error when given a non-positive value"""
    try:
      fizzbuzzSingle(0)
    except AssertionError:
      pass
    else:
      fail("Expected an assertion error, non-positive should fail.")

In cases where an error is the expected behavior, using the try/except/else syntax demonstrated that the error was thrown. Here Python is actually using a pun on pass/fail. Fail is a legitimate part of the unittest library. Pass is a python keyword indicating that it should do nothing. It’s similar to NOP in assembler code.

if __name__ == "__main__":
  unittest.main()

The main() method in unittest uses reflection to extract and run all of the methods in the object that start with “test” and executes them as the unit test suite.

Resources

Python.org
PyUnit / Python’s Unittest Docs
Python Cheat Sheet
The first article in this sequence

Python 1/4: Getting Started

This is the first article in a series, to read the entire series go here.

Why Python is Interesting

Programs must be written for people to read,
and only incidentally for machines to execute.
-Abelson and Sussman

Readable Syntax

When Guido van Rostrum designed Python it’s clear that he had this principle, if not this exact quote in mind. The result is a language which often reads like pseudo-code. For example:

def buyFruit(fruit):
  if fruit.color == 'red' and fruit in [Apple, Orange, Tomato]:
    return fruit.price

Flexible Semantics

Although Python primarily an object oriented/procedural language, it has functional elements including functions as first class objects, lambda expressions and map/filter/reduce functions. This gives considerable flexibility in how you go about solving problems.

An excellent summary of the design philosophy that guides both Python’s code design and the style of Python programs in general see: The Zen of Python

List Comprehension

Python’s list comprehension syntax is a compact and clear way to create new lists from existing lists. The syntax itself looks a lot like set comprehension notation as it’s used in discrete mathematics. Here are a few examples:

Given the base list of integers:

xs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can create a list of every integer doubled with:

[x * 2 for x in xs]

Or you can create a list of only odd integers with:

[x for x in xs if x % 2 == 1]

Or a combination of both, doubling all of the odd integers from 1 to 10:

[x * 2 for x in xs if x % 2 == 1]

Generators

Generators are a powerful addition to Python’s syntax. They permit a form of lazy evaluation where they create values as they are needed rather than all at once. For example, a common way to write a function for a Sieve of Eratosthenes to generate prime numbers is:

def generatePrimes(topValue):
  primes = [2]
  for testValue in range(3, topValue + 1):
    divisors = [prime for prime in primes if testValue % prime == 0]
    if divisors == []:
      primes += [testValue]
  return primes

The above function will, when called, generate all of the prime numbers from 2 up to the value given in the argument. The problem is that it will generate them all immediately, whether you need them or not. This is potentially a serious waste of resources if it comes out that you only needed a handful of them. Python solves this problem by using generators:

def generatePrimes(topValue):
  primes = [2]
  for testValue in range(3, topValue + 1):
    divisors = [prime for prime in primes if testValue % prime == 0]
    if divisors == []:
      primes += [testValue]
      yield testValue

The only difference between our new function and the original is the last line. Note that in the new function there is no return statement, instead there is a ‘yield’ statement, and it is inside the for loop. The new function returns a generator which can be passed to any function that operates on lists and it will treat it as a linked list. Should the function that uses the generator return before all of the values have been extracted then the remaining values will never be generated. This will allow you to avoid doing work that isn’t needed. Even if you find that you need all of those values, rather than having a potentially long wait at the start of the process, the work will be spread throughout the process, allowing for just in time delivery of your data.

Python’s Lineage

ABC

Python was originally developed to be a replacement for ABC, and on the surface the resemblance is striking:

ABC:

PUT {} IN telephone
PUT 5551212 IN telephone["Spam Foo"]
PUT 8674309 IN telephone["Eggs Bar"]
FOR name IN keys telephone:
  WRITE "Name:", name, " Phone:", telephone[name] /

Python:

telephone = {}
telephone["Spam Foo"] = 5551212
telephone["Eggs Bar"] = 8674309
for name in telephone.keys():
  print "Name:", name, " Phone:", telephone[name]

Both of the above map telephone numbers to names in an associative array, then print them in the form: Name: Spam Foo Phone: 5551212 Name: Eggs Bar Phone: 8674309

In addition to the similarities in syntax, they both use an interactive environment which allows you to type your code directly into the command line and get a response from the interpreter.

Haskell

From the appearance of the syntax, it’s clear that Python acquired its list comprehension functionality from Haskell.

Haskell:

[oddNum * 2 | oddNum <- [1 .. 10], oddNum `mod` 2 == 1]

Python:

[oddNum * 2 for oddNum in range(1, 11) if oddNum % 2 == 1]

Both of the above produce a list of integers, [2, 6, 10, 14, 18] which are the odd numbers from 1 to 10, doubled.

Python Installation

Complications to Look Out For

Currently there are two active branches in Python, 2.7 and 3.1. Python 3.1 has a number of improvements to the language which make the syntax regular and prevent some difficult to track bugs that are a serious issue in 2.7. Unfortunately it is not backwards compatible. I’ll be using 2.7 for this article because it continues to have the most libraries available for it.

If you’re interested in the changes made in Python 3.1, you can find them at What’s New In Python 3.0.

Downloading And installing Python 2.7

You can find the latest release of Python 2.7 from Python download. From that point installation is easy, run the installer and accept the default values. The full install will take 52MB of space on your drive.

First Program

For a first program, we’ll get started with “Hello World”. In Python this one is so trivial that it takes more effort to run it than to write it. Copy and paste the code below into your preferred text editor and save it as ‘helloworld.py':

helloworld.py:

print "Hello World"

Executing the Program

The primary purpose of the hello world program is to create a simple program that demonstrates how to compile/execute a program in the given language.

To execute this one type “c:\python27\python helloworld.py” at the command line.

There is no direct use of a ‘main’ function in Python. Code execution starts at the beginning of the file and progresses through the end.

There are a number of ways to impose more structure on a Python program, but the purpose of this first project is to make sure our systems are configured correctly. I’ll get to the more interesting stuff in next week’s post.

Trivia

Python was named for the British comedy group Monty Python. Python’s creator creator, Guido van Rostrum has the final say on all of Python’s design decisions and is called the Benevolent Dictator for Life.

Resources

Python Wikipedia Article

Python.org

Python Download

The Zen of Python

What’s New In Python 3.0

Python Cheat Sheet