Icons-04.png
Icons-03.png
Icons-02.png

© 2019 by Rex McKay

BACKGROUND

Euterpe

Putting the "muse" in music with accessible VUI

 

Music apps like Shazam and SoundHound are able to identify songs playing on the radio or in a store. This helps users discover new music and favorite bands. But what if the music is not there and someone is trying to describe a song to a friend? Shazam requires that the song actually be playing to check its library and provide an answer. SoundHound can accept a user singing, but they must remember every word and hit the note if they want a response.

 

A user needs a song identifier precisely because a user cannot accurately recreate or search for a song. This is where our quest to design Euterpe began. We want to ask people about songs we love, but we cannot sing if our lives depended on it.

In this design sprint, I served as a UI and UX designer.

 

Research

 

Current Solutions

Current available apps do not work well with humming and rely on lyrics for accurate matches—therefore, our design aims to process different types of musical inputs (e.g. singing, humming, beatboxing, scatting, etc.). This will enable it to identify a wider range of genres and address users who can only remember tunes rather than the lyrics. 

Blind Searching

What it Offers

  • Gives a range of possible results

  • Easy access

What it Lacks

  • Requires some recall or lead to begin search

  • If user has an idea of a possible song, they then have to scrub through songs to discover the section where the recalled part is to confirm

Shazam

What it Offers

  • Identifies surrounding music to a specific song

  • Shows lyrics of song

What it Lacks

  • Gives a single result that’s only useful when accurate or detected sound is in high fidelity

  • Only works with pre-recorded music; does not identify humming or live singing

SoundHound

What it Offers

  • Can identify human singing or humming

  • Can play music as requested

  • Offers more than one song

What it Lacks

  • Requires higher fidelity reproduction to identify

  • Requires very accurate recollection of melody and fails to even guess if not accurate enough

Users

Based on our initial hypothesis and educated assumptions, our primary users are young adults in their 20’s who are interested in discovering new music as well as discussing it with others. These individuals are often avid users of music streaming services such as Spotify and are more likely to attend live shows; therefore, they are assumed to be more likely to be encountering unfamiliar music. This group is also generally more receptive to voice interactions and using new methods.

 

Our system is not exclusive to this group, and there are likely additional users and use cases; but at the moment, this is our focal audience.

What it Offers

Persona

Jaime is an avid music listener and is keen to catchy tunes. She finds music to be an interesting topic of conversation and likes making song references and suggestions. However, when she listens to a new band or a friend suggests a new song, she often has trouble recalling the name. When she searches for it later, she’s not sure what to enter into Google to even begin. Ultimately, she types vague descriptions and listens to several videos in hopes of rediscovering the song.

Goals

  • Share taste in music

  • Salve the itch caused by memory loss

  • Discover about new bands and songs

Frustrations

  • Hears a lot of new songs, but cannot recall the names of them

  • Uses Spotify to discover music, but its shuffling method with Daily Mixes and Discover Weekly makes tracking down difficult. If she does not look up her history in time, there is a chance the songs she likes are cycled out of play.

  • Interested in genres that have limited vocals and uses extensive sampling

  • Has trouble remembering names of songs

  • Wants to know about songs sampled in other songs

  • Drawing up original songs rather than intended alternative versions

  • Songs are indie or new, so may be less recognizable or harder to find

Constraints and Considerations

For this problem, the user will not know when they will go blank. Therefore, the design should be ready at a moment's notice, seizing any thread of recollection the user has left. Because of this, the interaction and design will be mobile focused initially with later smart home device integration in consideration.

 

The song will be identified via various voice inputs. After processing, the system will give multiple suggestions to adjust for mistakes. Consequently, the system will require a large database and strong sound-analysis method. At this stage, we are focusing on the interaction and the designed aspects.

 

To help users identify songs quickly, results are clipped so users can jump straight to the moments that sound most like their input. If a user expands a track, they can play the entire song as desired. The user can also sync the discovered songs to Spotify, as this feature is already provided by the competitor apps.

 

We gravitated towards “Euterpe,” the Greek muse of music and lyrics for our brand and aesthetics. It is a unique name that can effectively act as a trigger for voice input (e.g. “Hey Euterpe, what’s the song that goes…?”). This uniqueness prevents users from triggering the receiver by accident. The name may be tricky to pronounce if a user has never heard it — so we have created an onboarding process to teach new users how to properly pronounce the name. The Euterpe theme is continued through the design of the app, nodding to the style seen in Greek amphoras.

 

Storyboard

  1. User is in a conversation. An occasion arises:

    1. Person 1: Have you heard of that song in X? It goes like “...”

    2. Person 2: Oh, I think I know what you’re talking about, but I don’t know what it’s called...

  2. User triggers the system with a voice input: “Hey Euterpe, what’s the song that goes *hums*“

  3. System analyzes the input.

  4. System suggests songs; one is the one the user is looking for

  5. User also listens to the other suggestions; they like them and add them straight to Spotify

  6. User is excited that they were able to recall the song and find a new one they like

Design

 

User Flow

First, a flowchart outlining the information architecture of the system was created to determine the general interaction schematics:

Sketches

Then, initial sets of sketches were drawn to determine the basic layout and placement of elements.

Wireframes

The layout and branches were further defined through wireframe iterations in Sketch. Visual identity was influenced by the artistic color and form of classical Greek amphoras, emphasizing the concept of the muse while also projecting a clean and high contrast design.

Interactive Prototype

The wireframes were exported into InVision Studio and connected to create an interactive prototype. Below is a demonstration of the interactive app flow:

Refined

As this was a design sprint, screens were created designed for a rapid interactive prototype. I have since revisited the project to see how the screens could be aesthetically improved and updated for the more recent iPhone X models.

 

Challenges

The primary feature of Euterpe is that it would allow lower-fidelity input to help find music. Shazam and SoundHound have large databases that tag sound input to specific songs that have been pre-recorded. This is why Shazam does not return results when a user tries to provide their own instrumentation. SoundHound goes a step further and has integrated machine learning into their system to be able to detect humming and singing. Our model will likely build off of this concept, but we will need to conduct research on input variance that would allow for both expected results and an actionable number of them. The system will potentially be pushed in stages where it accepts singing, then humming, then other noises.

 

With using a new voice system, onboarding is necessary to teach first-time users how to interact. We will want to create layover screens that describe valid phrases and features. Although the name of the system (Euterpe) that triggers the voice interaction is distinct, it can be tricky for users to remember how to pronounce if unfamiliar. To preclude that, the splash screen upon first opening the app features a callout for both branding and prompting pronunciation. Refer to the demo to see this in action.

 

Since we focused more on the design aspects of this assignment, the concept spawned between the two designers rather than from true user-based research; therefore, the context of the system will need to be more formally validated.

 

Next Steps

We will want to consult a sound engineer to identify how we would make this system more technically feasible. This will help us understand the limitations of the system and allow us to establish an implementation plan.

 

We would like to add more features that make the system more than just a song-identifier to justify an app. One of the ideas that were discussed was potentially adding a pitch-guidance feature to help people become better at singing, so that they’ll be able to identify songs faster. We also want to look at how our system could also be socialized through sharing karaoke recordings. Before this though, we will want to make more progress on our voice recognition system to ensure we have a base to build off of.

 

We will need to conduct user interviews to validate the market for the product as well as to ensure we are addressing the right pain points in the right ways. Furthermore, it would behoove us to identify additional use cases that we may have overlooked that could be addressed in our design.

 

We would test out the prototype to validate its effectiveness and identify any potential usability issues. This would involve moderated user tests, chalkmark tests, cognitive walkthroughs, and heuristic evaluations. This process will help refine our design and alert us of any opportunities or points needing improvement. Additionally, voice interaction may provide an unintentional solution for visually impaired users. At the very least, we should evaluate our design for accessibility for a range of user skills.