Bembel-B Blog

2007/12/06

Downloading Artist Images from Discogs with Python

In my previous post I told about foobar2000 and album art, but there are also artist images to be shown with the FofR Theme. For reasons explained later and just for fun and learning Python, I started to write a Python script for downloading artist images from Discogs through their Web API.

Discogs Logo
There’s already some application for downloading artist images, but the server it depends on is currently offline. Then there’s the Discogs component for foobar2000, which can be used for tagging and downloading album art and artist images. But at least for the artist images these will be named with the Discogs artist id, and not the artist names. That’s not useful for me, except I’d retag all my audiofiles to contain this ID. A thing I don’t dare to do.

So this is my first approach on writing a Python application. I’m very impressed how easy and fast it went, combining some tutorials’ code (sorry, can’t remember them all for credit and copyright :/) and some peeks at the code reference.
The only complications were charset issues: Firstly because I didn’t know about the inner workings of Python with special chars (it’s Unicode :). Secondly because I was using Cygwin’s Python which don’t seem to handle (output/input) any special chars at all, as its native charset is set to us-ascii (only 7 Bit chars).
Well, and one other confusion came from using tabs to indent the sourcecode, resulting in “weird” interpreter errors.
So now I switched to Windows Python (charset cp850) and all is fine. This script should also run nicely under Linux and alike.

This script is very, very ugly and totally not failsafe. It’s just a starting point and as I told above, I’m a total beginner in Python. Just for your amusement, here’s the code so far. Expect updates sometime. :)

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

import urllib2, gzip, cStringIO
import urllib
import re
import xml.sax.handler
import getopt, sys

apikey = "111"

#artistname = u"DJ Ötzi"
           
stdout_encoding = sys.stdout.encoding or sys.getfilesystemencoding()
fs_encoding = sys.getfilesystemencoding()
print stdout_encoding

def xml2obj(src):
    """
    A simple function to converts XML data into native Python object.
    """

    non_id_char = re.compile('[^_0-9a-zA-Z]')
    def _name_mangle(name):
        return non_id_char.sub('_', name)

    class DataNode(object):
        def __init__(self):
            self._attrs = {}    # XML attributes and child elements
            self.data = None    # child text data
        def __len__(self):
            # treat single element as a list of 1
            return 1
        def __getitem__(self, key):
            if isinstance(key, basestring):
                return self._attrs.get(key,None)
            else:
                return [self][key]
        def __contains__(self, name):
            return self._attrs.has_key(name)
        def __nonzero__(self):
            return bool(self._attrs or self.data)
        def __getattr__(self, name):
            if name.startswith('__'):
                # need to do this for Python special methods???
                raise AttributeError(name)
            return self._attrs.get(name,None)
        def _add_xml_attr(self, name, value):
            if name in self._attrs:
                # multiple attribute of the same name are represented by a list
                children = self._attrs[name]
                if not isinstance(children, list):
                    children = [children]
                    self._attrs[name] = children
                children.append(value)
            else:
                self._attrs[name] = value
        def __str__(self):
            return self.data or ''
        def __repr__(self):
            items = sorted(self._attrs.items())
            if self.data:
                items.append(('data', self.data))
            return u'{%s}' % ', '.join([u'%s:%s' % (k,repr(v)) for k,v in items])

    class TreeBuilder(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.stack = []
            self.root = DataNode()
            self.current = self.root
            self.text_parts = []
        def startElement(self, name, attrs):
            self.stack.append((self.current, self.text_parts))
            self.current = DataNode()
            self.text_parts = []
            # xml attributes --> python attributes
            for k, v in attrs.items():
                self.current._add_xml_attr(_name_mangle(k), v)
        def endElement(self, name):
            text = ''.join(self.text_parts).strip()
            if text:
                self.current.data = text
            if self.current._attrs:
                obj = self.current
            else:
                # a text only node is simply represented by the string
                obj = text or ''
            self.current, self.text_parts = self.stack.pop()
            self.current._add_xml_attr(_name_mangle(name), obj)
        def characters(self, content):
            self.text_parts.append(content)

    builder = TreeBuilder()
    if isinstance(src,basestring):
        xml.sax.parseString(src, builder)
    else:
        xml.sax.parse(src, builder)
    return builder.root._attrs.values()[0]

def downloadartistimage(uri, filename):
    fp = urllib2.urlopen(uri)
    op = open(filename, "wb")
    n = 0
    while 1:
        s = fp.read(8192)
        if not s:
            break
        op.write(s)
        n = n + len(s)
    fp.close()
    op.close()
    for k, v in fp.headers.items():
        print k, "=", v
    print "copied", n, "bytes from", fp.url
    return 0

try:
    opts, args = getopt.getopt(sys.argv[1:], "ha:v", ["help", "artist="])
except getopt.GetoptError:
    # print help information and exit:
    print("no argument given")
    sys.exit(2)
verbose = False
for o, a in opts:
    if o == "-v":
        verbose = True
    if o in ("-h", "--help"):
        print("no argument given")
        sys.exit()
    if o in ("-a", "--artist"):
        artistname = a.decode(fs_encoding)

requesturi = "http://www.discogs.com/artist/%s?f=xml&api_key=%s" % (urllib.quote_plus(artistname.encode('utf-8')), apikey)
print "Requesting: %s" % requesturi
request = urllib2.Request(requesturi)
request.add_header('Accept-Encoding', 'gzip')
response = urllib2.urlopen(request)
data = response.read()
unzipped_data = gzip.GzipFile(fileobj = cStringIO.StringIO(data)).read()
# print(unzipped_data)

data_obj = xml2obj(unzipped_data)
images = data_obj.artist.images

primaryfound = False
bigsecondarysize = 0
for image in images.image:
    print "Type: %s URL: %s" % (image.type, image.uri)
    if image.type == "primary":
        primaryfound = True
        fn = u"%s.%s" % (artistname, image.uri.rpartition('.')[2])
        print u"Downloading primary image as %s from %s".encode(stdout_encoding) % (fn, image.uri)
        downloadartistimage(image.uri, fn)
        continue
    if image.type == "secondary":
        if (image.width + image.height) > bigsecondarysize:
            bigsecondarysize = image.width + image.height
            bigsecondary = image
        continue

if not primaryfound:
    fn = u"%s.%s" % (artistname, bigsecondary.uri.rpartition('.')[2])
    print u"Falling back to secondary as %s sized %sx%s at %s".encode(stdout_encoding) % (fn, bigsecondary.width, bigsecondary.height, bigsecondary.uri)
    downloadartistimage(bigsecondary.uri, fn)

print "All done! :)"

And now two usage examples:

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>example-04.py -a "Aphex Twin"
cp850
Requesting: http://www.discogs.com/artist/Aphex+Twin?f=xml&api_key=111
Type: secondary URL: http://www.discogs.com/image/A-45-005.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1094774583.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1097005597.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1098171105.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1107949060.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1122852930.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949071.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949078.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949085.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949091.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1129512422.jpeg
Type: primary URL: http://www.discogs.com/image/A-45-1176664580.jpeg
Downloading primary image as Aphex Twin.jpeg from http://www.discogs.com/image/A-45-1176664580.jpeg
content-length = 141117
set-cookie = sid=5c3847142265e10e296934b877585749; path=/; expires=Sun, 03-Dec-2017 00:07:28 GMT; domain=.discogs.com
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:28 GMT
content-type = image/jpeg
copied 141117 bytes from http://www.discogs.com/image/A-45-1176664580.jpeg
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>example-04.py -a "Black Sabbath"
cp850
Requesting: http://www.discogs.com/artist/Black+Sabbath?f=xml&api_key=111
Type: secondary URL: http://www.discogs.com/image/A-144998-1098725461.jpg
Type: secondary URL: http://www.discogs.com/image/A-144998-1147641856.jpeg
Falling back to secondary as Black Sabbath.jpeg sized 528x531 at http://www.discogs.com/image/A-144998-1147641856.jpeg
content-length = 45353
set-cookie = sid=e47b9acbe8257ca4ad7fe6944a36fef1; path=/; expires=Sun, 03-Dec-2017 00:07:08 GMT; domain=.discogs.com
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:08 GMT
content-type = image/jpeg
copied 45353 bytes from http://www.discogs.com/image/A-144998-1147641856.jpeg
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>

Blog at WordPress.com.