This year has been my second time participating at the International Society for Music Information Retrieval (ISMIR) conference. A year marked by the COVID-19 pandemic, the death of George Floyd and the consequent protest in the US, the Black Lives Matter movement, and the protest in Hong Kong. The list goes on of events which have (and will) make a substantial impact upon our society — events to which it is difficult to remain passive.
To participate at ISMIR as a “listener” without having any accepted paper is without a doubt a stroke of luck, thanks only to the fact that this year it has been virtual. I always feel privileged to be part of a research group that guarantees me the opportunity to participate in conferences (when having results to present), and this year more than ever I realized how the possibility to participate in conferences is, without doubt, a privilege denied to many researchers around the world. A logic that can be broken by the shift to virtual conferences.
To participate as a listener is also a good thing for people like me who feel too-much-pressure when talking over a stage with a mic in front of hundreds of colleagues (to give you a hint, you can have a look at my presentation at ISMIR2019, and see what it means to be anxious). To be honest, I was enthusiastic about presenting at ISMIR my first paper as a supervisor of a master student (but it has been rejected). In his thesis, Dougal explored how recommender systems can amplify gender bias, and what are the conditions under which such amplifications hurt non-male artists. If you want to know more about it, the paper has been lately accepted at the 2nd Workshop on the Impact of Recommender Systems co-located with ACM RecSys 2020, and it is available in arXiv.
It is motivating to see that the interest of the ISMIR community in promoting diversity and inclusion in the conference, and the scientific topics addressed, is strong, surely thanks also to the efforts done by the WiMIR initiative (this year with a renewed proposal of virtual meetings during two months previous to ISMIR).
Participating as a listener, and researching the listener-side of recommender systems, this has been my leitmotiv. According to the literature, it should be user-side, but still: “Drug Dealers and IT are the only people who call their customers users”. A citation also used in the recent documentary “The Social Dilemma”, which by the way the keynote speaker of this year ISMIR, Dr. Safiya Noble, indirectly criticized in her talk with really convincing and totally shareable arguments.
Talking about listeners, three works caught my eye, the three backed up by music streaming services (unsurprisingly? Let’s note that academia cannot do this kind of research, while industry can):
1. Mood classification using listening data (Pandora)
2. Should We Consider the Users in Contextual Music Auto-Tagging Models? (Deezer)
3. Do User Preference Data Benefit Music Genre Classification Tasks? (QQ Music — Tencent Music).
But, what is the common thread between them? Let’s look at excerpts from their abstracts:
[…] embeddings obtained through matrix factorization of listening data appear to be more informative of a track mood than embeddings based on its audio content.
[…] Our work shows that explicitly modeling the user listening history into the automatic tagging process could lead to more accurate estimation of contextual tags.
[…] Experimental results not only show that user preference data can benefit genre classification, but also affirm the universally applicable value of our music embeddings.
It seems that in three different (but similar) MIR tasks, Mood Classification, Auto-tagging, and Genre Classification, what listeners do, what listeners listen to, what listeners like, is truly important to consider. Being into the listener-side (and also being not too much into signal processing), I always thought that what listeners do in terms of interactions with musical objects can say a lot, sometimes more than the most perfect analysis in terms of audio content. Why? Let’s make a short non-technical digression.
In Dimaggio’s work “The classification of art”(1987) emerged the idea that, as Dimaggio late summarize with his own words:
[…] relations between persons and cultural symbols are characterized by duality, such that shared tastes or interests constitute social groups and shared publics constitute genres or subcultures.
Even if he wasn’t specific on the music domain, such dual nature often makes me think. Going specific to the music domain, let’s also consider Molino’s words in his seminal work “Musical Fact and the Semiology of Music” (1975):
What is called music is simultaneously the production of an acoustic ‘object’, that acoustic object itself, and finally the reception of the object.
What we can get from that? Music is not only passively enjoyed by listeners, but listeners are actively part of the definition of what music is. Even if it can seem like a trivial observation, few MIR systems (until now!) have transformed such knowledge in practice. And this is probably the common denominator that I found in the three previous papers. Knowledge from listeners is complementary to what can be extracted from the audio signal, and therefore necessary for obtaining better MIR systems. Good to know for those who still have trouble installing Essentia! (by the way, it’s kind of an urban legend that it is difficult to install it, because now thanks to the work of the Essentia team things are much easier, and the fact the new advancements in Essentia development have been awarded this year with the Best Reproducibility paper is a clear proof).
On a different note, I have found three other works which make use of valuable information from listeners, this time extracted by explicitly asking:
Always good to question Music Emotion Recognition (MER) models, especially given the potential impact they can have on society (never underestimate affective computing). Whilst some may say there is a kind of universality in emotion, some may not (maybe the truth is somewhere in the middle?). The authors here give some insight into how language can influence the perception of emotion.
While part of the methodology presented could be questioned (why authors have not used greedy diversification algorithms for creating the recommendation lists, if it is a practice largely accepted in the recommender systems community?) reading the results of semi-structured interviews is always helpful, in this case for knowing what listeners have to say about diversity.
Music is often listened to in social situations, but how do we behave in such situations? Clearly, we are now far from the times when for preparing a trip with your friends you had to choose which CDs to bring and which not. But now, are we OK with giving our smartphone to someone for selecting the music when we are driving? How do we react if after two hours of listening to Techno at high volume (me) our friends (my roommates) ask to change music selection? Somehow a work with different terminology from what is found normally at ISMIR.
I am sure that there have been many more excellent works at ISMIR, and this list is far from being exhaustive. To participate as a listener is fine because to listen to what others have to say is always a good practice. Next year ISMIR 2021 will be virtual again, so for sure, it will be another opportunity to listen again, and to be part of a great community (even without having impressive results to share!).
P.S. Thanks to Emilia, Juan, and Helena for their valuable feedback!
(You can find all ISMIR2020 papers cited in this post on the conference official website)
Chen, K., & Liang, B. (2020). Do User Preference Data Benefit Music Genre Classification Tasks? Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020
DiMaggio, P. (1987). Classification in Art. American Sociological Review, 52(4), 440–455.
Dimaggio, P. (2011). Chapter 20: Cultural Networks. In J. Scott, & PJ Carrington (Eds.), The Sage Handbook of Social Network Analysis (pp. 286–301). London: SAGE Publications Ltd
Epps-darling, A., Takeo Bouyer, Ro., & Cramer, H. (2020). Artist gender representation in music streaming. Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020.
Gómez-Cañón, J. S., Cano, E., & Ug, S. (2020). Joyful for You and Tender for Us: The Influence of Individual Characteristics and Language on Emotion Labeling and Classification. Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020.
Ibrahim, K. M., Epure, E. V, Peeters, G., & Richard, G. (2020). Should We Consider the Users in Contextual Music Auto-Tagging Models? Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020.
Korzeniowski, F., Nieto, O., Mccallum, M. C., Won, M., Oramas, S., & Schmidt, E. M. (2020). Mood classification using listening data. Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020.
Molino, J., & Ayrey, C. (1990). Musical Fact and the Semiology of Music. Music Analysis, 9(2), 105–111; 112–156
Robinson, K., Brown, D., & Schedl, M. (2020). User insights on diversity in music recommendation lists. Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020.
Shakespeare, D., Porcaro, L., Gómez, E., & Castillo, C. (2020). Exploring Artist Gender Bias in Music Recommendation. 2nd Workshop on the Impact of Recommender Systems (ImpactRS20), Co-Located at RecSys2020. http://arxiv.org/abs/2009.01715
Spinelli, L., & Lau, J. (2020). Investigating User Perceptions Underlying Social Music. Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020, 489–496.
Watson, J. (2019). Programming Inequality: Gender Representation on Canadian Country Radio (2005–2019). Proceedings of the 21st International Symposium on Music Information Retrieval, ISMIR 2020, 392–399.