We had good success bringing our Wunderlist skill live (see our blog post here) and want to
explore the possibilities of Alexa further.

Alexa can stream music from sources like Spotify or Amazon Music, but a skill to stream from SoundCloud is missing. So I started to implement one!

Here is a demo

Playing your first track with Alexa

Playing a track in response to a user’s intent simply requires adding a PlayDirective in the response (code written in Kotlin).

The most important properties here are url and token. The token is used as sort of an identifier that will be available in all further requests while the stream is playing.

And that’s it! Alexa will now start to play the mp3 file.

Continuous playback

Of course you usually want to listen to more than one track. So let’s say the initial intent of the user was

Alexa, open SoundCloud and play my favorites

So how and when do we tell Alexa to play the next track? In case of an audio player enabled skill, Alexa will automatically send some special requests to the skill implementation. On the JVM we can react to these by implementing the AudioPlayer interface in our Speechlet. It provides a hook that will be called shortly before a track reaches the end of its playback.

The important parts are

  • Load the next track.  Alexa developers will know the concept of a session. They might be tempted to put the list of tracks into a session attribute and make the loadNextTrack method access the session. Sadly this is not how sessions work – they are only intended for a „conversation“ of a user with Alexa. Starting to play music always ends the session.
    Instead the persistence of the list has to be implemented manually. See further down on how I did it.
  • Use the ENQUEUE play behavior. This way Alexa will play the next track seamlessly after the first one has finished.
  • For ENQUEUE to work correctly it is important to use the correct values for token and expectedPreviousToken. For a discussion see the Amazon audio player reference.

So how to store the list of tracks to play? Luckily we get a unique identifier with each request, even the ones without session: the user id. In a request that has a session it is available via envelope.session.user.userId. In requests without a session (i.e. everything from the AudioPlayer interface) it is a bit harder to access, at least in the Java SDK. It’s hidden away in the context. Above you can see how to load it from there.

Since my skill implementation is hosted on AWS Lambda I decided to use DynamoDB for the persistence. I used the userId as the primary key and stored the list of tracks to play, the current position and some other metadata under it.

Pause, Next, Previous…

In addition to the special requests from the AudioPlayer interface there are special intents that the user can say without needing the name of the skill that plays audio. For example AMAZON.PauseIntent and AMAZON.NextIntent can be just used with

Alexa, pause

and

Alexa, next

Since these are handled as normal intent requests the current token and play offset have to be accessed differently. They are stored in the audio player state.

With this information it is possible to determine the next track in an intent request. Of course the user id is also accessible via the envelope.session.user property.

The update of the player state for a user is best done in the AudioPlayer request handlers, though. Only then you can be sure Alexa has reacted correctly.

Next steps

I have a good grasp on how to handle audio playing Alexa skills now. It is definitely harder than „normal“ skills since tracking the current state of the audio player and deciding what to persist when requires a lot of attention.

The next steps are the more traditional: I have to find a way to make music discoverable and provide more functionality, like

  • Play any playlist (not only favorites)
  • Play another user’s stream

If you have further suggestions or questions about Alexa skills please ask away in the comments!

If you want to learn more about Alexa Skill Development, go here. If you would like to get notified when the SoundCloud skill is available, sign up for our newsletter below.

Join our newsletter – stay in the loop!

  • Phrig

    This is amazing! When will You release this skill?

    • Moritz Schulze

      We are not sure yet. Apart from the technical implementation there is some organizational work to be done. For example we are not allowed to actually use the name SoundCloud when we release since we don’t own that trademark.

      • Phrig

        *sigh* So much red tape! I wish you guys luck… this skill will be well received by the SC community.

        • Moritz Schulze

          We submitted it now under the name „Cloud music“, there will be an update when it gets certified by Amazon!

          • Phrig

            Hey! Any luck getting the certification through with Amazon?

          • Moritz Schulze

            Hi!
            Sorry, I forgot to update here. Sadly no – due to licensing reasons we cannot publish the skill.

            You can read more about it here: https://techdev.de/the-complete-guide-to-running-a-soundcloud-alexa-skill/

            We also open sourced our code and the above blog post contains instructions on how to use it. Albeit, it is not completely trivial.

            Cheers,
            Moritz

  • coops

    great work! can this skill be applied to mixcloud as well?

    • Moritz Schulze

      I just had a look at it, and sadly it seems no. They don’t provide the stream URLs to the audio to developers.