Bembel-B Blog

2008/03/11

Scrobbling Everywhere All the Time

I must confess I’ve become quite a Last.fm fanboy. :) So what would be more important than keeping track of as much music playing as possible. Scrobbling the plays of my PC audioplayer Amarok and Foobar2000 ain’t that spectacular, but feeding statistics of my mobile MP3 Player SanDisk Sansa e200 and my stand-alone player Pinnacle SoundBridge HomeMusic (licensed by Roku) I consider being more of that kind.Last.fm Social Music Revolution

Scrobbling Sansa e200 with Rockbox

Precondition is using the great alternative Firmware Rockbox. It already has the Audioscrobbler logging built in. To submit the logs I use the PC application QTScrobbler under Linux (and occasionally Windows). That’s very easy and convenient. Just be sure to set your Sansa’s clock somewhat correct.

Scrobbling SoundBridge with Firefly Mediaserver

To gain access to my whole music collection without having a PC running, I’m using the fine Linksys NSLU2 NAS running the Firefly Mediaserver (aka. mt-daapd) with a cheapo 160 GB USB HDD (Storage) and a 2 GB USB Flash Drive (OS) attached. I’ve had a working setup using the alternative NSLU2 firmware Unslung, but soon switched to Debian ARM, for its greater versatility and more straight forward configuration.

I’ll write more detailed posts on the NSLU2 soon, especially regarding Firefly and fixed-point Transcoding and Last.fm Radio. But for now a quick overview on the setup, which should be possible on other platforms and for any streaming client too.

I obtained the Firefly Mediaserver prebuilt from the Firefly website. Installation is quite easy and well documented.

Submission to Last.fm is done by the Python application Lastfmsubmitd. As the name suggests it’s a daemon permanently waiting for data to be submitted. That data is gathered from text files placed e.g. in /var/spool/lastfm. Under Unslung I had to manually install it from source (python setup.py install), and for Debian it’s in the apt repo (but I built a deb package of the recent version found in Debian unstable).

Creating the data files is done periodically by a shell script based on what I found in the Firefly Forum. It’s run every 5 minutes by cron and queries the “last played field” of Firefly’s collection database and outputs results to Lastfmsubmitd’s spool directory.
That’s my current shell script (converting GMT+1 timestamps to UTC by substracting 3600 seconds):

#!/bin/bash

# fetch newly played songs from fireflydb and write
# into lastfmsubmitd readable format

# config
SQLITE=sqlite3
DATABASE=/var/cache/mt-daapd/songs3.db
LASTFILE=/var/cache/mt-daapd/lastfmsubmit.date
DBLSFILE=/var/cache/mt-daapd/lastfmsubmit.ls
TMPDIR=/tmp
SPOOLDIR=/var/spool/lastfm


# get last run time
if [ -e "$LASTFILE" ]
then
  . "$LASTFILE"
else
  LASTRUN=0
fi

# get last database file date
if [ -e "$DBLSFILE" ]
then       
  . "$DBLSFILE"
else     
  DBLSRUN=
fi

# exit when database file unchanged
DBLSNOW=`ls -l "$DATABASE"`
if [ "$DBLSRUN" == "$DBLSNOW" ]
then
  exit
fi

# log file date
echo "DBLSRUN=\"$DBLSNOW\"" > "$DBLSFILE"

# query database
OUTFILE=$(mktemp "$TMPDIR"/mt-daapd-XXXXXXXX)
"$SQLITE" "$DATABASE" 'SELECT artist,album,title,track,song_length,time_played FROM songs where time_played > '"$LASTRUN"' ORDER BY time_played ASC;' | gawk -F '|' '{ printf "---\nartist: \"%s\"\nalbum: \"%s\"\ntitle: \"%s\"\ntrack: %s\nlength: %d\ntime: !timestamp %s\n",$1,$2,$3,$4,$5/1000,strftime("%Y-%m-%d %T",$6-3600) }' > "$OUTFILE"

# place non-zero result into spool, else drop file
if [ -s "$OUTFILE" ]
then
  chmod 664 "$OUTFILE"
  mv "$OUTFILE" "$SPOOLDIR"
else
  rm "$OUTFILE"
fi

# log query date
echo "LASTRUN="`date +%s` > "$LASTFILE"

Downside of this solution is, Firefly will only consider a track as played, if it has been completely and continuously been played. So skipping or pausing a track will cause it to not be submitted.
Also there’s no separation between Podcasts and the rest of my music collection. What I haven’t tried yet, is the behaviour when playing web radio via Firefly playlists, as I do all radio streaming directly through the SoundBridge user interface.
To iron out the downsides using the same approach, the first one would need changes to the Firefly code I guess, the others could probably be fixed modifying the shell script.

Advertisements

2007/12/06

Downloading Artist Images from Discogs with Python

In my previous post I told about foobar2000 and album art, but there are also artist images to be shown with the FofR Theme. For reasons explained later and just for fun and learning Python, I started to write a Python script for downloading artist images from Discogs through their Web API.

Discogs Logo
There’s already some application for downloading artist images, but the server it depends on is currently offline. Then there’s the Discogs component for foobar2000, which can be used for tagging and downloading album art and artist images. But at least for the artist images these will be named with the Discogs artist id, and not the artist names. That’s not useful for me, except I’d retag all my audiofiles to contain this ID. A thing I don’t dare to do.

So this is my first approach on writing a Python application. I’m very impressed how easy and fast it went, combining some tutorials’ code (sorry, can’t remember them all for credit and copyright :/) and some peeks at the code reference.
The only complications were charset issues: Firstly because I didn’t know about the inner workings of Python with special chars (it’s Unicode :). Secondly because I was using Cygwin’s Python which don’t seem to handle (output/input) any special chars at all, as its native charset is set to us-ascii (only 7 Bit chars).
Well, and one other confusion came from using tabs to indent the sourcecode, resulting in “weird” interpreter errors.
So now I switched to Windows Python (charset cp850) and all is fine. This script should also run nicely under Linux and alike.

This script is very, very ugly and totally not failsafe. It’s just a starting point and as I told above, I’m a total beginner in Python. Just for your amusement, here’s the code so far. Expect updates sometime. :)

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

import urllib2, gzip, cStringIO
import urllib
import re
import xml.sax.handler
import getopt, sys

apikey = "111"

#artistname = u"DJ Ötzi"
           
stdout_encoding = sys.stdout.encoding or sys.getfilesystemencoding()
fs_encoding = sys.getfilesystemencoding()
print stdout_encoding

def xml2obj(src):
    """
    A simple function to converts XML data into native Python object.
    """

    non_id_char = re.compile('[^_0-9a-zA-Z]')
    def _name_mangle(name):
        return non_id_char.sub('_', name)

    class DataNode(object):
        def __init__(self):
            self._attrs = {}    # XML attributes and child elements
            self.data = None    # child text data
        def __len__(self):
            # treat single element as a list of 1
            return 1
        def __getitem__(self, key):
            if isinstance(key, basestring):
                return self._attrs.get(key,None)
            else:
                return [self][key]
        def __contains__(self, name):
            return self._attrs.has_key(name)
        def __nonzero__(self):
            return bool(self._attrs or self.data)
        def __getattr__(self, name):
            if name.startswith('__'):
                # need to do this for Python special methods???
                raise AttributeError(name)
            return self._attrs.get(name,None)
        def _add_xml_attr(self, name, value):
            if name in self._attrs:
                # multiple attribute of the same name are represented by a list
                children = self._attrs[name]
                if not isinstance(children, list):
                    children = [children]
                    self._attrs[name] = children
                children.append(value)
            else:
                self._attrs[name] = value
        def __str__(self):
            return self.data or ''
        def __repr__(self):
            items = sorted(self._attrs.items())
            if self.data:
                items.append(('data', self.data))
            return u'{%s}' % ', '.join([u'%s:%s' % (k,repr(v)) for k,v in items])

    class TreeBuilder(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.stack = []
            self.root = DataNode()
            self.current = self.root
            self.text_parts = []
        def startElement(self, name, attrs):
            self.stack.append((self.current, self.text_parts))
            self.current = DataNode()
            self.text_parts = []
            # xml attributes --> python attributes
            for k, v in attrs.items():
                self.current._add_xml_attr(_name_mangle(k), v)
        def endElement(self, name):
            text = ''.join(self.text_parts).strip()
            if text:
                self.current.data = text
            if self.current._attrs:
                obj = self.current
            else:
                # a text only node is simply represented by the string
                obj = text or ''
            self.current, self.text_parts = self.stack.pop()
            self.current._add_xml_attr(_name_mangle(name), obj)
        def characters(self, content):
            self.text_parts.append(content)

    builder = TreeBuilder()
    if isinstance(src,basestring):
        xml.sax.parseString(src, builder)
    else:
        xml.sax.parse(src, builder)
    return builder.root._attrs.values()[0]

def downloadartistimage(uri, filename):
    fp = urllib2.urlopen(uri)
    op = open(filename, "wb")
    n = 0
    while 1:
        s = fp.read(8192)
        if not s:
            break
        op.write(s)
        n = n + len(s)
    fp.close()
    op.close()
    for k, v in fp.headers.items():
        print k, "=", v
    print "copied", n, "bytes from", fp.url
    return 0

try:
    opts, args = getopt.getopt(sys.argv[1:], "ha:v", ["help", "artist="])
except getopt.GetoptError:
    # print help information and exit:
    print("no argument given")
    sys.exit(2)
verbose = False
for o, a in opts:
    if o == "-v":
        verbose = True
    if o in ("-h", "--help"):
        print("no argument given")
        sys.exit()
    if o in ("-a", "--artist"):
        artistname = a.decode(fs_encoding)

requesturi = "http://www.discogs.com/artist/%s?f=xml&api_key=%s" % (urllib.quote_plus(artistname.encode('utf-8')), apikey)
print "Requesting: %s" % requesturi
request = urllib2.Request(requesturi)
request.add_header('Accept-Encoding', 'gzip')
response = urllib2.urlopen(request)
data = response.read()
unzipped_data = gzip.GzipFile(fileobj = cStringIO.StringIO(data)).read()
# print(unzipped_data)

data_obj = xml2obj(unzipped_data)
images = data_obj.artist.images

primaryfound = False
bigsecondarysize = 0
for image in images.image:
    print "Type: %s URL: %s" % (image.type, image.uri)
    if image.type == "primary":
        primaryfound = True
        fn = u"%s.%s" % (artistname, image.uri.rpartition('.')[2])
        print u"Downloading primary image as %s from %s".encode(stdout_encoding) % (fn, image.uri)
        downloadartistimage(image.uri, fn)
        continue
    if image.type == "secondary":
        if (image.width + image.height) > bigsecondarysize:
            bigsecondarysize = image.width + image.height
            bigsecondary = image
        continue

if not primaryfound:
    fn = u"%s.%s" % (artistname, bigsecondary.uri.rpartition('.')[2])
    print u"Falling back to secondary as %s sized %sx%s at %s".encode(stdout_encoding) % (fn, bigsecondary.width, bigsecondary.height, bigsecondary.uri)
    downloadartistimage(bigsecondary.uri, fn)

print "All done! :)"

And now two usage examples:

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>example-04.py -a "Aphex Twin"
cp850
Requesting: http://www.discogs.com/artist/Aphex+Twin?f=xml&api_key=111
Type: secondary URL: http://www.discogs.com/image/A-45-005.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1094774583.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1097005597.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1098171105.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1107949060.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1122852930.jpg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949071.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949078.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949085.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1126949091.jpeg
Type: secondary URL: http://www.discogs.com/image/A-45-1129512422.jpeg
Type: primary URL: http://www.discogs.com/image/A-45-1176664580.jpeg
Downloading primary image as Aphex Twin.jpeg from http://www.discogs.com/image/A-45-1176664580.jpeg
content-length = 141117
set-cookie = sid=5c3847142265e10e296934b877585749; path=/; expires=Sun, 03-Dec-2017 00:07:28 GMT; domain=.discogs.com
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:28 GMT
content-type = image/jpeg
copied 141117 bytes from http://www.discogs.com/image/A-45-1176664580.jpeg
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>example-04.py -a "Black Sabbath"
cp850
Requesting: http://www.discogs.com/artist/Black+Sabbath?f=xml&api_key=111
Type: secondary URL: http://www.discogs.com/image/A-144998-1098725461.jpg
Type: secondary URL: http://www.discogs.com/image/A-144998-1147641856.jpeg
Falling back to secondary as Black Sabbath.jpeg sized 528x531 at http://www.discogs.com/image/A-144998-1147641856.jpeg
content-length = 45353
set-cookie = sid=e47b9acbe8257ca4ad7fe6944a36fef1; path=/; expires=Sun, 03-Dec-2017 00:07:08 GMT; domain=.discogs.com
server = Apache
connection = close
reproxy-status = yes
date = Thu, 06 Dec 2007 00:07:08 GMT
content-type = image/jpeg
copied 45353 bytes from http://www.discogs.com/image/A-144998-1147641856.jpeg
All done! :)

C:\Dokumente und Einstellungen\scheff\Eigene Dateien\python\pydiscogs>

Blog at WordPress.com.