Creating an Entertaining and Informative Music Visualization
1. By: Michael Pouris, BSc, Deborah I. Fels, PhD, P.Eng
IMDC, Ryerson University, Toronto, Canada
2. 1. Background information
2. Research and Model
Purpose of the visualization
Past Industry attempts
The priorities
Psychological model
3. The implementation
Translating to a visual medium
4. Study results and discussion
3. Music is a major art form
Present in all cultures worldwide
Transcends cultural and language boundaries
Portal to a cultural shared experience
(McDermott, 2004)
Hard of hearing and deaf have limited access to
a shared hearing experience
4. Serve as a tool that depicts music visually for
hard-of-hearing or deaf audiences
Sensory substitution is possible (Nanayakkara,
Taylor, Wyse and Ong, 2009)
Research questions:
Can sensory substitution models be used to provide
access to music using the visual channel?
What are user reactions to MusicViz?
5. Industry attempts have failed
Does not use justified psychological model
which describes auditory mappings to visual
system
6. Three primary goals are to develop a
visualization that is:
1. Aesthetically pleasing
2. Emotionally moving
3. Adaptable
7. Interpretation is to explain the meaning of
something with subjective bias
Interpreting feelings of the music is not
feasible
There is not one “feeling” for a musical piece
Everyone interprets music differently
Goal: Visualization is open to each individuals
interpretation
8. Translation is to change or convert to another
form
Translate auditory cues to visual medium
Give enough information
Goal: Individual can interpret information
that is translated
9. Auditory Visual
Auditory System Visual System
Pitch height and Pitch differences • Higher pitches = smaller objects
• Lower pitches = larger objects
• Pitches = altitude
Volume changes • Bigger objects = more prominent
Tempo and Beats • Indicate BPM and rhythm (repeated
pattern)
Based on Ilie & Thompson (2006)
10. Issue between pitch and volume in visual
system:
Higher pitch smaller object
Higher amplitude (volume) larger object
Problem: amplitude and frequency are
integrated within visual system
Solution:
Auditory System Visual System
Pitch height and Pitch differences • Associate with altitude
Volume changes • Associate with size
11. Implemented using JOGL 2.0 (Java OpenGL)
Is a wrapper class for C OpenGL calls
Uses GLUEGEN to bind Java to C
X-Axis: Instrument arrangement
Y-Axis: Pitch scale
Z-Axis: Time progression
12. Pitch: Pitch height determines height of the pipe
Timbre: Different “sound families” are defined through color
13. Volume: Depicted through using thickness of pipes
Tempo: Clusters of activity
14. Rhythm: Visual depiction of beats and interactivity
16. 12 participants
3 hearing, 3 hard of hearing and 6 deaf ASL users
Procedure:
6 one minute-songs, each from different genres in
random order
Genres: classical, country, jazz, pop, rap/hip-hop/R&B
(RHRB), rock
▪ Commonly used groupings in industry
Pre-study, post-song, post-study questionnaires
Eye tracking data recorded with FaceLab 5
Data analysed with repeated measures ANOVA
17. Feedback on enjoyment and emotional experience
Enjoyment:
5 point Likert scale (1-not enjoyable at all to 5-enjoyable)
Emotions:
1. Valence and arousal model
(Russell, 1972), overbearingness added
2. Discrete model (happy, sad, anger, fear) (Ekman, 1972)
Level of focus
18. No statistical difference between genres
Descriptive analysis:
All genres are rated between 4-somewhat enjoyable and
5-enjoyable
Rock is most enjoyable (M=4.5, SD=0.85)
RHRB (M=4.0, SD=1.054) and classical (M=4.0, 1.247) are
least enjoyable
▪ Initially hypothesized to be the most enjoyable due to bass
19. Overbearingness: 1-subtle to 9-overbearing:
Between pop (M=6.42, SD=1.084) and RHRB
(M=4.92, SD=1.564)
Between pop and country (M=4.25, SD=1.603)
Trend: an overbearing song can still be happy.
Pop is most overbearing (M=6.42, SD=1.084), highest
arousal (M=6.08, SD=2.021) and happiest
(M=6.75, SD=2.179)
Valence from 1-unhappy to 9-happy: not significant
Arousal: 1-calm to 9-excited: not significant
20. Four 7-point scales (1-weak to 7-strong) for:
1. Happiness 3. Anger
2. Sadness 4. Fear
No significant difference between genres
A song may have multiple emotions and people were
not able to choose one with limited set of words
21. 5-point likert scale: “1-my mind wondered a lot” to
“5-I was always focused on the visualization”
No significant difference between genres
Trends:
Paid most attention to rock (M=4.91, SD=0.302), country
(M=4.73, SD=1.206) and jazz (M=4.64, SD=0.505)
Paid least attention to pop (M=4.09, SD=1.378), RHRB
(M=4.09, SD=1.578)
22. Major limitation is the lack of participants
Additional participants are needed to determine
whether trends observed are actual statistical
differences
Additional participants are needed to explore the
differences between the hearing statuses
Participants showed preference for rock, pop and
classical genres
MusicViz provided enjoyment and information
23. Funding provided by GRAND, CFI, NSERC
Co-workers at the IMDC
Study participants
Hello Everyone, my name is Michael Pouris, my co-author is Dr. Deborah Fels. We are from the IMDC at Ryerson University in Toronto Canada. My presentation is about how to create an entertaining and informative music visualization. More specifically to create better access to music through visual aesthetics that are grounded in a psychological cognitive model for translating auditory constructs to visual constructs. I call my solution musicViz
First I will present some background information to the project. This includes the use of music in conveying modern culture and how the deaf, deafened and hard of hearing do not have access this shared knowledge space or have very little. I will then describe the purpose of music visualization and some past industry attempts and how they failed.I will describe the priorities that were extracted from these failures and the psychological model that I built to conform to these priorities, which is what my music visualization called MusicViz built on.
In modern western culture, music permeates lives. It is in, restaurants, spas, gyms, sporting events, clubs, concerts. People listen to it while working, cooking, lawn work, writing, etc. It is a major art form that is present in all cultures world-wide. It is, not only intrinsic to all cultures in modern day, it also intrinsic to all ancient cultures throughout history.Essentially, music transcends cultural, linguistic and temporal boundaries. Individuals can understand the emotional aspects and enjoy music of completely different cultures. This also holds true with modern reconstructions of ancient cultures. Essentially music is a portal to a shared cultural experience buy hard of hearing and deaf have little to no access to this culture.For people who are deaf or hard of hearing, they experience sound as physical tactile vibrations, therefore they feel sound through their bodies and because the lower frequencies are higher powered, they are a stronger signal and mask the higher frequencies. Therefore the only signal is low frequency vibration that are available for the tactile domain.Another way to communicate musical entertainment for the deaf or hard of hearing is through our music visualization system called MusicViz.
MusicViz’s purpose is to serve as a tool, which depicts music visually for hard of hearing and the deaf. The goal of visualization is to transform music into the visual medium to be entertaining.Nanayakkara Showed that deaf and hard of hearing audiences were able to experience emotion from visualization; however, It is important to note that they did not provide a sensory substitution model.The main research questions are:Can sensory substitution models be used to provide access to music using the visual channelWhat are user reactions to musicViz?
There have been previous industry attempts at music visualization such as AniMusic, Itunes, Windows Media player and Music Animation machine.These previous attempts have failed. They are entertaining, yet they do not give any informational or emotional queues of music. There have been other attempts at music visualization; however, they have focused only presenting music notation in different forms to make music notation easier to learn. Others have attempted to interpret the meanings of musical pieces by using colour to convey emotions, which is a large problem considering the meanings of colours differ not only between cultures but even differ within a culture.
From analysing the deficiencies of the previous failures, I have extracted three main priorities that MusicViz should address:Aesthetically pleasingEmotionally movingAdaptable, must work with all types of music
Interpretation is where the meaning of something is explained by an individual from a subjective viewpoint, essentially adding in bias, which should be avoided. There exists multiple feelings for a singular piece of music, that differ from individual to individual based on personal experience; hence everyone interprets music differently. The goal of the visualization is to present entertaining, yet objective information to the individual.This is done through TRANSLATION…
The definition of translation is to change or convert to another form. Therefore Music Visualization should simply translate auditory queues into visual queues using a proven psychological translation model. The goal is that the translated information can be interpreted by each individual based on personal experience. However, the visualization must not simply present information, it must also be entertaining.
In the auditory system, pitch height is associated with two different visual constructs. Higher Pitches = smaller objects (a mouse makes high pitched sounds)Lower pitches = bigger objects (an elephant)Higher pitches = higher in the air (cartoons where people are kicked into the horizon and a high pitched ding is used)Volume/amplitude is associated with object size and looming:Bigger object makes a louder soundSmaller object makes a softer soundBigger object are associated with looming (a car coming towards you or in evolutionary terms, a sabre tooth tiger attacking)Finally tempo and beats are indicate a repeated patter through beats per minute.
However, there is a problem with the translation from auditory to the visual medium. Pitch Height and Volume conflict. When studied independently, higher pitch is associated with a smaller object and louder sounds are associated with a larger object, but there is a conflict of variables when Therefore a note that is simultaneously high pitched and loud cannot be small AND big at the sometime. Essentially the problem is that amplitude and frequency are integrated in the visual system.
The translation was implemented using Java 6 and JOGL 2.0, which is OpenGL for Java. The API maintains performance through making native C calls through GLUEGEN (glue code generation), hence it acts as normal C OpenGL in terms of performance and function calls.Java and OpenGL was used because it was cross platform. The musical input is MIDI. The program called MusicViz received input in real time from a MIDI sequencer and displays the translations.The X-Axis is used to arrange the instruments, where the instruments are displayed as pipes.The Y-Axis is used to represent pitch-heightThe Z-Axis represents time progression hence tempo
The pitch height determines the height of the pipe on the Y-Axis.In order to differentiate the different instruments, a unique colour is assigned to each pipe. The colours are in no way made to represent emotions that the instruments convey and are in no way used to identify an pipe as a specific instrument. Colours are simply used to convey the information that different colour pipes are different instruments.
Volume is depicted using the thickness of the pipes. As can be seen, the blue pipe is thicker than the green pipe. Therefore we know that the green pipe is louder than the blue pipe.Tempo is depicted through clusters of activity, where a faster tempo means there are more changes in a smaller space. A slower tempo spreads the changes over more space.
An intrinsic element of deaf music is a deep bass drum. It is identifiable in all deaf music because it can be felt. Therefore the base drum is essential in the visualization of music. However, to provide more information to the users, the other drums are displayed as well.Inspired by deaf culturePitch contour is the movement of sound over time and this is represented as a worming motion by the pipes.
To determine the enjoyability and effectiveness of MusicViz, 3 H (one male, two female), 3 (HOH) (two male, one female), 6D (three male, three female). The participants completed a pre-study questionnaire to gather demographic information, such as gender, hearing status, musical preferences, reasons for listening to music and education.A post-song questionnaire was filled out after the participant listened to each song and a post-study questionnaire was conducted after the participant watched all six songs. The post-study questionnaire was used to give feedback on the meaning of individual constructs in the visualization. Such as shapes, movement, brightness etc.
The post study questionnaires were used to give feedback on participants’ enjoyment and emotional experience with the visualizations.Enjoyability was measured using a 5-point likert scale ranging from not enjoyable to enjoyable.Emotions are measured in two ways. The first way is with the discrete model proposed by Ekman that takes into account happy, sad, anger and fear. The second method through the 2D model of valence (happiness) and arousal proposed by Russell.Lastly the level of focus on the visualization and enjoyability of the colours used are measured.This presentation only focuses on the finding from the post-song questionnaires
The first to be analysed was the difference in enjoyability between genres and no statistical differences were found.Even though no statistical differences were found, the descriptives show that all genres are rated between somewhat enjoyable and enjoyable, which means the visualization is overall pleasant to view.Rock was the most enjoyable genre, possibly because it one of the simpler visualizations due to less instruments displayed but more movement from each pipe.However it is surprising that rap/hiphop and RB are the least enjoyable. Originally the hypothesis was that it would be the most enjoyable because it is what they said they liked. In the tactile domain, it is how they experience music; however this is not translated into the visual domain.Rap is possible the least enjoyable because its boring to see. Two instruments that are flashing, there is less movement.Classical is less enjoyable because it has too many instruments and too much movement.
There was a difference in the levels of overbearingness between genres. More specifically:Pop is more dominating than RHRBPop is more dominating than countryThe reason for pop being more overbearing than RHRB and country may be because of the same reason as previously mentioned, which is the amount of instruments on the screen.Pop music contains more instruments including drums and a prominent bass line that repeats a lot, while guitars performing with little or no drums often characterize country music and the lack of instruments characterize RHRB. Therefore more vigorous movement in pop is more overbearing.A noticed trend in the data is that even though a song can be overbearing, it can still be happy. For example, pop visualization is the most overbearing, with the highest arousal ratings yet it has the highest happiness ratingsValence and arousal have no statistical differencesValence is positive and negative feelingsArousal is engergy, sleepy, tired.Sadness is negative engery and negative feelingsFer is high energy and negative feelings
Differences in level of discrete emotionAnother measure is the differences in the level of discrete emotions. The levels of happiness, sadness, anger and fear.Unfortunately there is no significant difference in the ratings of discrete emotions between genres. An issue with this scale to rate emotions is that because music can have multiple emotions, the individuals had a difficult time choosing from a limited set of words
There was no significant difference in the level of focus on the visualization between genresPaid most attention to rock possibly because it has just enough change and movement to maintain focus and interest, whereas pop is too much and overtaxes the perception and cognitive systems and rap is too boring. It is too repetitive. Use example about class.
In conclusion, a major limitation to the study was the lack of participants. Therefore, in order to determine whether the observed trends are statistically significant, more participants are needed.In addition, more participants are needed to explore the differences among hearing statuses.However, in terms of descriptive analysis, the participants showed preferences for rock, pop and classical genres. They overall enjoyed the visualization and based on comments from participants, many thought MusicViz helped them understand musical information.
Deaf, deaf, Deafened, and Hard of Hearing ConsumersThe distinction between the terms Deaf, deaf, deafened, and hard of hearing is based principally on preferred modes of communication.Deaf (upper case 'D') is a term that refers to members of a socio-linguistic and cultural group whose primary language is sign language. In Englishspeaking parts of Canada, the main sign language is American Sign Language (ASL).Deafened and deaf (lower case 'd') are terms that refer to individuals whohave lost all or most functional hearing at some point in their lives. Thesepeople use spoken language and rely on visual forms of communication suchas speechreading, text, and, in some cases, sign language.Hard of hearing is a term that refers to individuals who have a hearing lossranging from mild to profound and who use their voice and residual hearingand, in some cases, sign language for communication.just fyi, found it in a canadian captioning standards document