Bembel-B Blog


Downloading Artist Images from Discogs with Python

In my previous post I told about foobar2000 and album art, but there are also artist images to be shown with the FofR Theme. For reasons explained later and just for fun and learning Python, I started to write a Python script for downloading artist images from Discogs through their Web API.

Discogs Logo
There’s already some application for downloading artist images, but the server it depends on is currently offline. Then there’s the Discogs component for foobar2000, which can be used for tagging and downloading album art and artist images. But at least for the artist images these will be named with the Discogs artist id, and not the artist names. That’s not useful for me, except I’d retag all my audiofiles to contain this ID. A thing I don’t dare to do.

So this is my first approach on writing a Python application. I’m very impressed how easy and fast it went, combining some tutorials’ code (sorry, can’t remember them all for credit and copyright :/) and some peeks at the code reference.
The only complications were charset issues: Firstly because I didn’t know about the inner workings of Python with special chars (it’s Unicode :). Secondly because I was using Cygwin’s Python which don’t seem to handle (output/input) any special chars at all, as its native charset is set to us-ascii (only 7 Bit chars).
Well, and one other confusion came from using tabs to indent the sourcecode, resulting in “weird” interpreter errors.
So now I switched to Windows Python (charset cp850) and all is fine. This script should also run nicely under Linux and alike.

This script is very, very ugly and totally not failsafe. It’s just a starting point and as I told above, I’m a total beginner in Python. Just for your amusement, here’s the code so far. Expect updates sometime. :)

# -*- coding: iso-8859-15 -*-

import urllib2, gzip, cStringIO
import urllib
import re
import xml.sax.handler
import getopt, sys

apikey = "111"

#artistname = u"DJ Ötzi"
stdout_encoding = sys.stdout.encoding or sys.getfilesystemencoding()
fs_encoding = sys.getfilesystemencoding()
print stdout_encoding

def xml2obj(src):
    A simple function to converts XML data into native Python object.

    non_id_char = re.compile('[^_0-9a-zA-Z]')
    def _name_mangle(name):
        return non_id_char.sub('_', name)

    class DataNode(object):
        def __init__(self):
            self._attrs = {}    # XML attributes and child elements
   = None    # child text data
        def __len__(self):
            # treat single element as a list of 1
            return 1
        def __getitem__(self, key):
            if isinstance(key, basestring):
                return self._attrs.get(key,None)
                return [self][key]
        def __contains__(self, name):
            return self._attrs.has_key(name)
        def __nonzero__(self):
            return bool(self._attrs or
        def __getattr__(self, name):
            if name.startswith('__'):
                # need to do this for Python special methods???
                raise AttributeError(name)
            return self._attrs.get(name,None)
        def _add_xml_attr(self, name, value):
            if name in self._attrs:
                # multiple attribute of the same name are represented by a list
                children = self._attrs[name]
                if not isinstance(children, list):
                    children = [children]
                    self._attrs[name] = children
                self._attrs[name] = value
        def __str__(self):
            return or ''
        def __repr__(self):
            items = sorted(self._attrs.items())
            return u'{%s}' % ', '.join([u'%s:%s' % (k,repr(v)) for k,v in items])

    class TreeBuilder(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.stack = []
            self.root = DataNode()
            self.current = self.root
            self.text_parts = []
        def startElement(self, name, attrs):
            self.stack.append((self.current, self.text_parts))
            self.current = DataNode()
            self.text_parts = []
            # xml attributes --> python attributes
            for k, v in attrs.items():
                self.current._add_xml_attr(_name_mangle(k), v)
        def endElement(self, name):
            text = ''.join(self.text_parts).strip()
            if text:
       = text
            if self.current._attrs:
                obj = self.current
                # a text only node is simply represented by the string
                obj = text or ''
            self.current, self.text_parts = self.stack.pop()
            self.current._add_xml_attr(_name_mangle(name), obj)
        def characters(self, content):

    builder = TreeBuilder()
    if isinstance(src,basestring):
        xml.sax.parseString(src, builder)
        xml.sax.parse(src, builder)
    return builder.root._attrs.values()[0]

def downloadartistimage(uri, filename):
    fp = urllib2.urlopen(uri)
    op = open(filename, "wb")
    n = 0
    while 1:
        s =
        if not s:
        n = n + len(s)
    for k, v in fp.headers.items():
        print k, "=", v
    print "copied", n, "bytes from", fp.url
    return 0

    opts, args = getopt.getopt(sys.argv[1:], "ha:v", ["help", "artist="])
except getopt.GetoptError:
    # print help information and exit:
    print("no argument given")
verbose = False
for o, a in opts:
    if o == "-v":
        verbose = True
    if o in ("-h", "--help"):
        print("no argument given")
    if o in ("-a", "--artist"):
        artistname = a.decode(fs_encoding)

requesturi = "" % (urllib.quote_plus(artistname.encode('utf-8')), apikey)
print "Requesting: %s" % requesturi
request = urllib2.Request(requesturi)
request.add_header('Accept-Encoding', 'gzip')
response = urllib2.urlopen(request)
data =
unzipped_data = gzip.GzipFile(fileobj = cStringIO.StringIO(data)).read()
# print(unzipped_data)

data_obj = xml2obj(unzipped_data)
images = data_obj.artist.images

primaryfound = False
bigsecondarysize = 0
for image in images.image:
    print "Type: %s URL: %s" % (image.type, image.uri)
    if image.type == "primary":
        primaryfound = True
        fn = u"%s.%s" % (artistname, image.uri.rpartition('.')[2])
        print u"Downloading primary image as %s from %s".encode(stdout_encoding) % (fn, image.uri)
        downloadartistimage(image.uri, fn)
    if image.type == "secondary":
        if (image.width + image.height) > bigsecondarysize:
            bigsecondarysize = image.width + image.height
            bigsecondary = image

if not primaryfound:
    fn = u"%s.%s" % (artistname, bigsecondary.uri.rpartition('.')[2])
    print u"Falling back to secondary as %s sized %sx%s at %s".encode(stdout_encoding) % (fn, bigsecondary.width, bigsecondary.height, bigsecondary.uri)
    downloadartistimage(bigsecondary.uri, fn)

print "All done! :)"

And now two usage examples:

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs> -a "Aphex Twin"
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: secondary URL:
Type: primary URL:
Downloading primary image as Aphex Twin.jpeg from
content-length = 141117
set-cookie = sid=5c3847142265e10e296934b877585749; path=/; expires=Sun, 03-Dec-2017 00:07:28 GMT;
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:28 GMT
content-type = image/jpeg
copied 141117 bytes from
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs> -a "Black Sabbath"
Type: secondary URL:
Type: secondary URL:
Falling back to secondary as Black Sabbath.jpeg sized 528x531 at
content-length = 45353
set-cookie = sid=e47b9acbe8257ca4ad7fe6944a36fef1; path=/; expires=Sun, 03-Dec-2017 00:07:08 GMT;
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:08 GMT
content-type = image/jpeg
copied 45353 bytes from
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>

1 Comment »

  1. You would probably spare some lines using SimpleXML. h**p:// On the other hand I would do the same using SAX as you ;)

    Comment by starenka — 2007/12/24 @ 10:12 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: