Predicting Immigration Attitudes in Britain: Data Analysis Tutorial for MSc/MPhil Students

Introduction: Why Immigration Attitudes Matter

In June 2026, debates about immigration remain at the forefront of British politics, especially with the upcoming general election. Understanding what drives public opinion is crucial for policymakers and campaigners. In this tutorial, we walk through the key steps of analyzing a social survey dataset on attitudes to immigration in contemporary Britain, using the structure of a typical MSc/MPhil data analysis assessment. We'll focus on descriptive statistics, inferential methods, and regression modeling, while emphasizing clear communication and avoiding raw R output.

Getting Started: Data Exploration and Cleaning

Your first task is to load the dataset (either dataset_1.RDS or dataset_2.RDS depending on your birth month) and get a feel for the variables. Use summary() and str() to check for missing values, outliers, and coding. For example, imm_att5 is your outcome variable measuring attitudes on a 1–5 scale. Pay attention to the zodiac variable — though it may seem irrelevant, it's included to test your judgment: you might decide to exclude it from models because it's unlikely to be a genuine predictor of immigration attitudes.

Descriptive Statistics

Produce a table of means and standard deviations for key variables like age, hh_inc, and imm_att5, broken down by groups such as graduate or urban. Use dplyr and kableExtra to create publication-ready tables. For instance, you might find that graduates have a higher mean imm_att5 than non-graduates, suggesting education is associated with more positive views.

library(dplyr)
library(kableExtra)

data %>%
  group_by(graduate) %>%
  summarise(mean_att = mean(imm_att5, na.rm = TRUE),
            sd_att = sd(imm_att5, na.rm = TRUE)) %>%
  kable(caption = "Mean immigration attitude by graduate status")

Visualizing Relationships

Create a bar chart of imm_att5 by occ_class to see how attitudes vary across occupation groups. Use ggplot2 with clean labels and a theme. Avoid 3D effects or unnecessary colors. A well-designed figure can reveal patterns, such as managers/professionals holding more favorable views than working-class respondents.

Inferential Analysis: Linear Regression

To quantify the strength and direction of relationships, fit a multiple linear regression model with imm_att5 as the outcome. Include predictors like age, female, urban, london, bornUK, graduate, renter, contact, occ_class, and hh_inc. Justify your choices: for example, you might include contact because intergroup contact theory suggests it reduces prejudice.

model <- lm(imm_att5 ~ age + female + urban + london + bornUK + 
            graduate + renter + contact + occ_class + hh_inc, 
            data = data)
summary(model)

Present the regression results in a formatted table (using stargazer or modelsummary) with coefficients, standard errors, and significance stars. Interpret each coefficient: e.g., a positive coefficient for graduate means that having a degree is associated with a higher attitude score, holding other factors constant.

Checking Assumptions

Test for multicollinearity using VIF, and examine residual plots for homoscedasticity and normality. If assumptions are violated, consider robust standard errors or transformations. For example, hh_inc might be log-transformed to reduce skew.

Model Comparison and Final Model

You might compare this model with a reduced model (e.g., dropping zodiac) using an F-test. AIC and BIC can also guide model selection. Ultimately, choose the most parsimonious model that fits the data well.

Reporting Your Findings

Write a coherent narrative summarizing which factors are most strongly associated with immigration attitudes. For instance, you might conclude that education and contact with immigrants are the strongest positive predictors, while being born in the UK is associated with less favorable views. Avoid causal language — these are associations, not causes.

Common Pitfalls and How to Avoid Them

Copy-pasting R output: Always format results in tables or inline text.
Ignoring missing data: Use na.omit() or imputation, but report how you handled it.
Overinterpreting p-values: Focus on effect sizes and confidence intervals.
Including irrelevant variables: Justify every predictor; drop zodiac unless you have a theory.

Conclusion

By following this structured approach, you can produce a rigorous and transparent analysis of immigration attitudes in Britain. Remember, the key is to demonstrate your analytical thinking and ability to communicate results clearly — skills that are highly valued in data science and social research.