Content Analysis

Susanne Vosmer

We all know that communication is important, but how can you study it? Well, you could use content analysis. Widely applied across the social sciences and the humanities, this method is suitable for many research topics.

Berelson (1952) described it as a technique, which enables researchers to obtain systematic, accurate and objective descriptions of manifest communicational contents. Initially, it was a purely quantitative method that was popular in communication studies in the 1950s. However, the quantitative paradigm, including content analysis, were heavily criticised as the qualitative paradigm gained acceptance. Objectivity, the nature of one particular reality and assumptions that knowledge depends on experiments, were questioned. Due to the rise of qualitative methodologies in the 1970s, content analysis had to be adapted, and qualitative content analysis was developed.

Qualitative methodologies are based on an idealist, relativist ontology and an interpretivist and/or social constructionist epistemology. In contrast to quantitative content analysis, which focuses on the manifest content of written or recorded communication, qualitative content analysis concerns itself with the latent content, the underlying meaning of texts, which is anchored in the deep structure of texts.

For Berelson, only analysis of the manifest, visible content was appropriate. This approach assumes a sender of messages, content contained in these messages, and a receiver, who can accurately decode them. Unsurprisingly, these views no longer hold. Communication is not so passive and its latent content is vitally important. Krippendorff (2013) was instrumental in developing content analysis further. He viewed content as a tangible entity that is contained in messages.

But what exactly is content? Words, phrases, sentences, passages and symbols. Inductive and deductive analyses enable you to derive meaning from raw data. In the former, this is done in a bottom up fashion. The latter focuses on hypotheses testing and relies on theory, which can also be tested. It’s a top down approach. Nowadays, content analyses are either quantitatively or qualitatively, or both. Moreover, quantitative content analysis can be used in combination with other qualitative methodologies. What you have then is a mixed method approach. I will explain this in more detail by starting with quantitative content analysis.

It’s the systematic and replicable examination of symbols of communication. These are assigned numeric values, which are statistically analysed. It enables you to describe the communication, draw inferences about its meaning and its communicational context. Traditionally, researchers coded and analysed text manually. Codes and coding form an integral part of the research process.

Codes refer to concepts and their identification by means of explicit criteria. In quantitative content analysis, manifest content is analysed by using descriptive statistics. You calculate frequencies of words or phrases and report means (the average) and standard deviations (how much one value deviates from the mean). More complex statistical analyses may also be carried out.

Large amount of data can be analysed by using software programmes, even though their coding procedures vary. Some software packages count words, recognise the context of words and entire linguistic networks. Complex and elaborate semantic and linguistic structures can be identified. Important textual features and their relations are conceptually mapped. Software exists for both quantitative and qualitative data analyses. Codebooks can be created to code qualitative data when you use NVivo. It allows to ‘norm’ coding. It operationalises coding based on theory, models or prior research. Computer coding saves time, reduces human error and is more valid (accurate) than manual coding. However, you still need to analyse your data qualitatively. NVivo cannot do this.

Validity is crucial even in qualitative research, where it is referred to as trustworthiness. It involves credibility and transferability. Obviously, in order to carry out a valid content analysis, the researcher must convey the true meaning of a communication. If your analysis is not in sync with the speaker’s (author’s) communication, there is little point to carry out a content analysis. Your descriptions and procedures must be clear and transparent so that they can be assessed and possibly replicated. Another researcher should derive at identical coding results. Otherwise, your codes are not reliable. Hence, it’s a good idea to ask someone else to check your analysis when manually coding text.

Interrater reliability can be statistically tested by calculating correlations. Cohen’s kappa is a suitable correlation coefficient and .80 or higher is acceptable. So content analysis is based on strict assumptions and measures. Nevertheless, it gives you some flexibility. You can choose between conducting studies yourself and directly gathering data by using surveys or interviews and analysing transcripts (primary data). Or you can use already existing data. Available discussions, narratives from films, articles, policies and speeches, are examples of secondary data.

In qualitative content analysis, researchers aim to develop categories and themes. Let me clarify this process. First you first define and identify meaning units (words, a sentence or statement that represents ideas or a concept). Then you label similar meaning units. In this way codes are created. You group codes together into categories and organise these into a theme. This is a bottom up approach.

When you use a top down approach, codes are assigned prior to your data collection, which is (deductive), instead of being created during your analysis (inductive). So your coding frame (schema that lays out key concepts, their definitions, and criteria used to identify and sort concepts) has to be predetermined. Often, the coding process is time consuming, because you want to identify potentially interesting events, features or behaviours, and distinguish these by applying labels. This is referred to as open coding. Here is an example of how this would look like when your meaning units are sentences, which you have extracted from the raw data (text), and you have generated codes and categories:

Meaning units                                                                                                                  Codes                                          Category

I am very worried about my future and whether I will survive.                   I’m worried                              worries

Will the treatment help?

Can I ever work again? Finances worry me.

I’m also afraid of side effects of the chemo                                                                                                                fear

Radiation therapy scares me.

I feel guilty for smoking.                                                                                                                                                     guilt

Codes can be further differentiated or integrated, so that fewer categories are created (axial coding). During this process, codes can be renamed, deleted, combined and/or broken into smaller sub-categories (selective coding). In this example, fear is subsumed under the category ‘worries’. I have created six subcategories and one theme:

Category                                                                                  Theme                         

Worries                                                                                      Worry

Subcategories

Worries about self

Worries about future

Worries about Survival

Worries about the success of treatment

Worries about finances

Worries about side effects

The coding process is not linear. In the pursuit of a detailed analysis and more abstract themes, the relations within and between categories are continuously reviewed, until they solidify. Ultimately, you want to tell a coherent story through your data.

Some qualitative researchers have criticised quantitative content analysis for being too simplistic, arguing that it also neglects the context of the coded content. However, you can avoid this criticism by paying attention to the context of your study. Moreover, you can use combine qualitative and quantitative content analysis. Let’s look at current applications of content analysis, which use a conventional, directive or summative approach.

In conventional analysis, coding categories are derived from the text. Assume that you collect data by interviewing 20 cancer patients. You use open-ended questions, such as, “What has it been like to be in a group with other terminally ill people?”

Interviews are recorded/taped and then transcribed verbatim. To do the analysis, you read each transcript from beginning to end. Then you carefully re-read each transcript. With a highlighter you mark the text that describes the participant’s emotions. On the page, you write in the margin a keyword or phrase that captures the emotion or an emotional reaction, using the participant’s own words (open coding). After having coded each transcript, you devise preliminary categories and themes. These may need to be revised when comparing all transcripts with each other. By the end, you will have a hierarchy (theme, category, subcategory, code, meaning units), which captures the latent meaning of your entire data set.

The directive approach is initially deductive and therefore more structured. Existing theory or prior research findings guide the development of your codes. You identify key concepts from theory or research findings and use these to develop categories and themes. Let’s take the 5-stage grief model (denial, anger, bargaining, depression, acceptance) as an initial framework to identify the emotions of terminally ill patients, whom you interview.

As before, you highlight in the transcript emotions or anything that indicates the participant’s attitudes and mood. But you categorise these according to your predetermined themes (denial, anger, bargaining, depression, acceptance). You also decide whether subcategories are needed. Anger could be separated into anger towards oneself, towards doctors or towards a spiritual being. Data that cannot be categorised into any of the themes, must be re-examined. You may need to create a different emotional category and theme (e.g. happy). You can rank all your themes to find out how often they occur. You can compare your findings with the five stages of the grief model. Looking at the data, you could consider whether your data support the grief model. If not completely, to what extent? What does the ‘happy’ theme tell you about the theory?

A summative content analysis starts with a quantitative analysis, followed by analysis of the latent content. Let’s assume that you want to find out how often and under what circumstances terminally ill patients use the explicit terms die, dying or death, instead of euphemisms. Your research questions could be: “How are these explicit terms used in therapist-patient communication when discussing cancer?” “What other terms do people use?”

You recruit patients and therapists. Different types of ‘communication events’ are also used, including therapy groups, therapists’ discussions in supervision and patient-doctor conversations. You collect communication samples from these events and end up with a large amount of data.

For your data analysis you choose computer software, which searches for occurrences of the words die, death and dying in the transcripts and recordings of the communication events. Frequency counts for each of the three words are calculated and compared. You could also analyse patients’, therapists’ and doctors’ demographics and compare these with the explicit terms used. Furthermore, you search for euphemisms, alternative expressions of death, die and dying. You count these frequencies and compare them with the frequency of the explicit terms. Frequencies of euphemisms versus direct terms can also be compared for each speaker (patient, therapist, doctors). Additionally, you can compare demographic characteristics of therapists, doctors and patients within each and across communication event. You can also carry out a qualitative analysis of the latent content. This involves similar procedures as described above. It can uncover the deeper meaning of euphemisms and reveal in which contexts they were used.

Isn’t content analysis a useful method to study communication? Since it’s consistent with the aims of several other research methods, it can be integrated in quantitative and qualitative approaches. You could research communication in your groups, analyse session notes, Foulkes’ lectures, books, mental health policies or letters.

Before embarking on your research adventure, remember that you must decide what type of content you want to analyse. Moreover, whether to use probability or purposive sampling (the former relies on the concept of probability and the later on choosing your participants from a particular pool). In addition, whether you want to analyse manifest and/or latent content, use content analysis inductively or deductively, what coding controls you’ll put into place so that reliability and credibility are ensured, and whether to present your findings numerically, verbally or both.

If you would like learn more about content analysis, entertain yourself with Krippendorff on the beach, in the mountains or wherever you’ll spend your summer.

Cheerio and happy reading.

s.vosmer@gmail.com

Useful Books

Berelson, B. (1952). Content analysis in communication research. Glencoe, IL: Free Press.

Krippendorff, K. (2013). Content Analysis: An Introduction to its Methodology, 3rd Edition. Thousand Oaks, CA: Sage.

Kyngäs, H., Mikkonen K. and Kääriäinen, M. (Editors) (2020). The Application of Content Analysis in Nursing Science Research. Basel: Springer.