Reading, Parsing and Displaying RSS Data in Prolog
This is the third article in a series, to read it from the beginning go here.
The Plan
This week is when I really put the assertion that Prolog is a general purpose language to the test. Most decently rational Prolog programmers will tell you to do things like XML parsing and display through another language, then use Prolog dll’s to do the interesting logical stuff. But, it does have the libraries, so let’s see how they work.
What to Expect
When this article is complete:
- You will have:
- A command-line RSS aggregator.
- You will know:
- How to read a text file.
- How to read from an HTTP stream.
- How to parse an XML file.
- How to pretty print text to the screen.
Files Used in this Project
Compiling
To compile “xml.pl”, navigate to the directory that contains it on the commandline, then execute:
swipl -g main -o xml.exe -c xml.pl
The Code
Libraries Used
:- consult(library(sgml)).
:- consult(library(http/http_open)).
- sgml: The SGML library, which also handles XML.
- http/http_open: The HTTP library handles server and client code, I just used the small bit that let’s me treat an HTTP stream like a file.
Reading a File
readConfig(Lines) :-
open('feeds.txt', read, ConfigStream),
read_stream(ConfigStream, Lines).
read_stream(ReadStream, Lines) :-
read_line_to_codes(ReadStream, Line),
read_stream(ReadStream, Line, Lines).
read_stream(ReadStream, end_of_file, []) :- close(ReadStream), !.
read_stream(ReadStream, OldLine, [Atom | Lines]) :-
read_line_to_codes(ReadStream, NewLine),
string_to_atom(OldLine, Atom),
read_stream(ReadStream, NewLine, Lines).
read_stream
is a helper function to read multiple lines from a stream. It reads a stream in one line a time and returns/unifies its contents to Lines
. Most of it is similar to what you saw in the previous article, recursively pulling off one line at a time, testing if it’s the last line and adding it to the list. The two key elements that are worth noting are: the use of the atom end_of_file
, which is what the file stream returns when the stream is empty and the use of !
to indicate that if execution makes it to closing the stream, then there’s no need to backtrack.
readConfig
‘s functionality is pretty obvious. Open a file using an atom with the file’s name. Remember, single-quotes are for atoms, double quotes are for strings. This opens a read only stream and unifies it with ConfigStream
, then read_stream
pulls the text in the file out into Lines
.
Reading From HTTP
rssFetch(URL, XML) :-
http_open(URL, XmlStream, []),
load_xml_file(XmlStream, XML),
close(XmlStream).
The pattern here is similar to the one for readConfig, except that it extracts the XmlStream
straight to a Prolog XML structure. There’s quite a bit to it, but the short version is, it’s a nested linked list. There are a variety of functors in that list, but the one that interests us is element
which has the form element(<element name>, <attributes>, <sub elements>)
. For example when given the xml:
in text
load_xml_file
returns:
[element(testOuter, [],
['\n ', element(inside, [inAtt=hello], ['in text']), '\n'])].
Parsing XML
rssRead([], []).
rssRead([element(channel, _, Elements) | _], [Rss | Rsses]) :-
channelRead(Elements, Rss), rssRead(Elements, Rsses).
rssRead([element(rss, _, Elements) | _], Rss) :-
rssRead(Elements, Rss).
rssRead([_ | Elements], Rss) :-
rssRead(Elements, Rss).
channelRead(Elements, channel(Name, Titles)) :-
titleRead(Elements, Name), itemsRead(Elements, Titles).
itemsRead([], []).
itemsRead([element(item, _, Elements) | Items], [ Title | Titles ]) :-
titleRead(Elements, Title), itemsRead(Items, Titles).
itemsRead([_ | Items], Titles) :-
itemsRead(Items, Titles).
titleRead([element(title, _, [Text | _]) | _], Text) :- !.
titleRead([_ | Elements], Text) :-
titleRead(Elements, Text).
Strictly speaking Prolog is a weakly typed language, you have only a couple of basic types which all reduce to atoms and functors. But with those you can do the same tricks that are generally associated to algebraic type systems. Here is one example where I use the element
functor to recurse through the XML structure and extract the titles.
Once you know how Prolog’s XML structures work, it’s just a matter of figuring out what information you need and recursing until you get it. For this purpose I’ve created my own functor channel(<channel name>, <title list>)
.
In Prolog functors and atoms are created simply by using them. This makes writing the code feel a lot like freeform sketching a picture. I like it. But I’m pretty sure that unless I made an effort to document my code well, it would make large projects unwieldy.
Pretty Printing Text
displayRss(Channels) :-
foreach(member(channel(Name, Titles), Channels),
(writef("*** %w ***\n", [Name]), displayTitles(Titles), nl)).
displayTitles(Titles) :-
foreach(member(Title, Titles), writef("\t%w\n", [Title])).
writef
works similarly to printf in nearly every language I’ve worked in. It has a few quirks that are detailed in its docs.
Main And Helper Text
readAndDisplayRss(URL) :-
catch((rssFetch(URL, XML),
rssRead(XML, RSS),
displayRss(RSS)), _, readAndDisplayRss(URL)).
main :-
readConfig(FeedURLs),
foreach(member(URL, FeedURLs), readAndDisplayRss(URL)),
halt.
The above is glue code. It’s been written to continue retrying until it succeeds or runs out of stack, so make sure your URL’s are valid.
Final Summary
When I was reading up on Prolog, I heard predicates described as a generalization on functions. This week has driven that one home. Working in Prolog has felt a great deal like when I was first learning Haskell. I’ll spend hours fighting with something that I could do in my sleep in another language, then when I figure it out it is extraordinarily clear, obvious and feels cleaner than anything else I’ve worked with.
Hard Lessons Learned
foreach
is not the same as a for comprehension/list comprehension, no matter how much it looks like one. The key difference is that the second term in the statement does not contain its own namespace. What happens in there happens throughout the same namespace that contains the foreach
. For example:
main :-
foreach(member(Number, [1, 2, 3, 4]),
(Square is Number * Number,
write(Square), nl)).
Does not print the square of the numbers 1 through 4. It first asserts that Square is equal to 1 * 1, then that it is also equal to 2 * 2, 3 * 3, and 4 * 4, which is clearly false and ends execution of the foreach
. It is possible to do this, the safe and idiomatic way is to break the second term out into its own predicate:
showSquare(Number) :-
Square is Number * Number,
write(Square), nl.
main :-
foreach(member(Number, [1, 2, 3, 4]), showSquare(Number)).
That lesson took me half a day for me to learn. You’re welcome.
Coming Up
Next week I’ll be working out how to use XPCE, Prolog’s native GUI library.