Smart speakers were one of the great technological developments of the previous decade. Amazon was a pioneering brand by launching Alexa in 2013, one of the first smart speakers on the market, amid an environment of great enthusiasm for this technological development. Inspired by the voiced computer of the starship Enterprise from the iconic Star Trek television series, Alexa wowed audiences immediately, to the point that in less than six years Amazon celebrated that the gadget had sold more than 100 million pieces.
In this spirit of innovation and entrepreneurship, TagWizz, the number one Mexican video game services company, developed one of the most ambitious apps for Amazon’s smart speaker: Alexa Music Mash, a video game that challenges Alexa users to identify the correct version of a song or piece of music. In this postmortem we want to tell the story of how this very ambitious game, from a technical point of view, was developed by a Mexican team.
A video game for a Smart Speaker platform
The idea started from the traditional game Spot the Difference, where you compare two images and discover the differences between them. It is a very popular game among children when they begin to develop their visual cognitive skills, and I wondered what this game could be like with music instead of pictures. That’s when I saw the potential it had within the context of smart speaker technology. For example, Alexa plays a song in its correct version, then it plays it again but the drums are at a different rhythm, or the guitar is at a different chord. Could players recognize the correct version from the modified one? The project was born from this idea.
I like to develop these types of innovative games based on my experience in the industry. For instance, when the iPod touch came out, few believed in Apple’s success. At the time, I was working at Gameloft, and we gave that device a chance, while many other companies showed no interest. The future dictated that companies that had not betted on that technology had a difficult time catching up with the pioneers in that technology. When the smart speakers came out I thought similarly; entering the market as a pioneer because I believed in that technology. The goal was not only developing Alexa Music Mash, but to be part of that technology’s pioneers, and I am convinced that this will serve us well in the future.
What I liked the most about this project
This game is amazing because humans can intuitively detect when something sounds wrong, and differentiate it from how it should sound, even when they don’t know the song. It’s a natural ability of human beings, although perhaps it can be cultural. In any case, when it is very obvious that something sounds wrong, anyone can recognize it regardless of its context.
What Alexa does is play a song badly, and it does it to various degrees, in such a way that sometimes it is very evident, and other times the difference is minimal, which establishes the different levels of difficulty. Of the many tests we did with songs with very small timing variations, we reported something that should not surprise us: career musicians were able to perceive errors in song construction with much greater subtlety. To give you an example from outside the game: the song “Despacito” has a change in timing that generated a lot of debate on the internet, but mostly only professional musicians were able to recognize it.
However, not everything is musical professionalism. Another result that we detected was that there are differences in people’s physical hearing capacity, the same as with sight. There are people who hear well and people who do not have such acute hearing. We found that the better hearing you have, the more you detect the differences between versions. There are users who have more precision in the high tones and others in the bass. There is also the factor that age deteriorates hearing capacity.
Another audience I found it interesting to try the game with was the visually impaired. In many cases, these users have a more developed hearing. The first result was that they loved the game. And it should come as no surprise, because the industry vastly underestimate projects aimed at the visually impaired. These users saw in Music Mash a proposal that finally included them. The only element that made it difficult for them to play with full confidence was a visual cue that smart speakers use to organize turns, a spotlight that changes color to indicate the player’s turn to speak. If the blind persons are accompanied by someone who sees, this person can indicate their turn, and the game flows much better. If anything, the result is that they loved Alexa Music Mash, an achievement that was very satisfying for me personally.
Finally, I liked the collaboration we had from many indie musicians, who lent their music for this game, and use it as a method of promoting their art in an musical industry that often does not open doors for them because the industry prefers much more commercial projects. Along these lines, we offered the game interactively at the Musikmesse in Germany in April 2018, an event for musicians and music enthusiasts, one of the world’s most important gatherings for the sale of musical equipment.
Music Mash development process
At first, I thought it would be easy to make a game like Music Mash. That was the reason I assigned a single programmer to the project. I estimated that we would finish it in about six months, because we already had all the musical content, and I thought that once we had the content, it would be easy to program it. However, as the months passed, I realized that this project was much more challenging than I had thought. From something that in theory seemed easy, it took us about a year, from 2016 to 2017. The first sign was the programmer who, failing to advance, became so frustrated that he unfortunately left the team. The second programmer I assigned was not progressing in development either. So the challenge could not have been in the programmer, but in the very nature of the project.
To better understand what was happening, I actively engaged in the game’s programming. The first thing we discovered as a team is that the technology was not responding to our predictions. For example, every time design changes were made to the interface or user experience, the programming would break and stop working. This forced us to constantly correct and reprogram, many times from scratch, which represented a serious challenge for the continuity of the project.
We had to first recognize that programming a video game for a smart speaker was not easy at all. We were among the first to venture into a project like this, so there were no defined methodologies to resolve all these issues. I expanded the team of programmers to four in total, and one of them focused solely on touchscreen smart speakers.
There were several programming difficulties. In smart speakers, everything happens based on events; when the user says the word “two”, for example, Alexa sends the event that the user said “two”. Alexa has to understand what that “two” corresponds to, and if it is the answer to the song that the player is listening to. But if the player says the word “yes”, this can be the answer to various questions, and if the game is not well structured, the program gets confused; it can interpret it as quitting the game, or restarting the game, or other possibilities. The game has to be extremely well structured in terms of programming, or the application gets confused with users’ responses.
On the other hand, the user interface represented a very different challenge. During the development of any video game, it is normal for different user interfaces to be developed and put to the test until the most user-friendly interface is reached. But in the case of Alexa, (and probably because it was the first time we made a voice game), we did not have a game architecture with the flexibility to adapt to changes in the user interface.
Finally, smart speaker technology requires the user to use very appropriate and clear language when speaking to it. As the spoken language is full of subtleties and innuendos, we ran into communication failures between the user and Alexa. For instance, smart speakers do not understand accents very well, do not perfectly differentiate the user’s voice from background noise, or other users’ voices when they speak at the same time. Smart speakers have a hard time capturing voices that speak to it from a far distance, or do not register user intervention when they speak too early or too late in their turn.
Alexa is programmed to politely ask the user to repeat an answer that it did not understand, which can lead to the user becoming frustrated and responding emotionally, adding unnecessary words such as “I told you that…” or even “please”, and Alexa does not understands in that context, and asks the user to repeat one more time. This forced us to generate an interface that educates players to be rigorous, but players don’t like being rigorous but spontaneous. If there are several players, they talk freely with each other and Alexa no longer understands. This requires paying attention all the time, too seriously, as if the players were on an exam, and that doesn’t make it too fun, contrary to the goal we seek of entertaining intelligently and innovatively.
They were mainly of three types. They were financial, discovery or marketing challenges, and user experience.
Although smart speaker technology was a hit in the last decade, and Alexa sold 100 million copies in six years, Music Mash was not profitable for us, mainly because Amazon offers its Alexa apps for free, and they do not allow them to include advertising. Consequently, we had no way of earning money to pay for development. Our earnings came solely from the United States thanks to in-app purchases in the form of subscriptions, and in the United Kingdom because Music Mash was played by so many users, especially in 2020, and Amazon gave us a financial compensation as a reward. The problem is that since much of the content on Alexa is free, it does not represent an incentive to develop more or better content. This is the reason we do not continue making games of this type, despite the fact that we were very interested in getting into this technology.
In marketing terms, especially with user accessibility to discover Music Mash, Alexa does not show all the options a user can play. We are very used to displaying visual lists that users go through until they find what they are looking for, but due to Alexa’s auditory feature, this requires it to recite a huge catalog. How do users find out about an app like Music Mash? Advertising campaigns are needed on billboards, on the radio, on users’ computers. But Alexa doesn’t advertise its apps. Amazon advertises its apps through email, but as we all know, not everyone checks all their emails, especially those that contain advertising content. In the UK something different happened, the game became very popular last year, but we are not sure why, nor what Amazon UK did in particular to publicize it.
Finally, when it comes to user experience, also known as UX, the market has accustomed users through sight, since we have created a lot of content to be displayed on screens. But on Alexa, as I already mentioned, only voice can be used, especially for smart speakers without touchscreens. This was quite a challenge because we couldn’t show the user the same amount of options that we can do visually. That is why we had to greatly simplify the number of options for the player, and propose everything in a very contextual way, so that Alexa only speaks options according to what the player is doing, and this required a lot of trial and error.
What we did right
We managed to do something functional and very innovative at the time when technology was being born. We made it both a single player and multiplayer game on a platform as special as smart speakers. It allowed us to gain a lot of experience in smart speaker technology, which we will one day take advantage of. We managed to motivate many artists, with whom we are truly grateful for having helped us in the content of this video game. Without them, we would not have been able to do it. And finally, we created a game that gave access to the visually impaired. Seeing their excitement as they played, their laughs and nice emotions were really satisfying for us.
What we did wrong
Perhaps Music Mash did not adapt to the market as we expected. The market was not mature enough for this type of content, both in terms of “discovery” (marketing) and revenue. There was enough market, because a lot of smart speakers were being sold. But it’s not clear how to make a profitable business model out of it for developers.
Today, I would have liked Music Mash to be a much shorter and more everyday experience, of two to three minutes maximum, instead of having to play for 10–15 minutes. It could have been a much more daily experience and generate more retention, with very specific daily challenges. Make interactions shorter, to offer the user to play for a moment every morning, for instance.
Smart speaker technology was an undisputed success. This opens up a huge field for innovation in the video game industry, as long as developers want to think outside the box. That’s what we did at TagWizz with Alexa Music Mash. Nobody in Mexico has developed the experience that we have acquired thanks to the Music Mash project, and this puts us at the forefront of innovation in the video game industry in our country. I’m also convinced that there is an untapped market in the visually impaired users if developers were audacious enough to come up with radical new ideas for video games. An area of opportunity is open to generate a totally new video game culture that gives access to the visually impaired, and find a whole new market of consumers.
How gamers can find this game
The game is available in Mexico, the United States, Spain, the United Kingdom, and France. The user has to say the command to Alexa as follows: “Alexa, open music mash” or “Alexa, play music mash ”. We also have a mobile version of this game.