Research Methods in Linguistics Project

Anglicism Usage Survey Analysis

Survey-based analysis of how university students use English loanwords in everyday conversation, academic settings, and social communication using descriptive statistics, t-tests, logistic regression, decision trees, and confusion-matrix evaluation.

Course

Research Methods

Data Type

Survey Data

Sample Size

30 Responses

Tools

Python / Jupyter

Project Overview

This project investigates the frequency and usage of Anglicisms among university students. Anglicisms are English loanwords or English-derived expressions that are incorporated into another language, such as German. The project focuses on how often students use these words, in which contexts they use them, and what motivates their usage.

The analysis was conducted for the course Research Methods in Linguistics at TU Dortmund. It combines survey design, descriptive statistics, visualization, statistical testing, logistic regression, and interpretable classification models.

Research Motivation

English has a strong influence on modern communication, especially in academic, professional, and digital contexts. Students often encounter English words through social media, internet culture, university courses, software tools, and international communication. This makes Anglicisms an interesting topic for studying language contact and linguistic change.

Main research question: How frequently do university students use Anglicisms, and do usage patterns differ by study background, language background, and communication setting?

Survey Design

The data were collected using a structured survey. The questionnaire included consent, demographic information, language background, study program, study-program language, general Anglicism usage, usage settings, motivations, and frequency ratings for selected Anglicisms.

  • Consent: all 30 respondents agreed to participate voluntarily.
  • Participants: university students from different study programs.
  • Core topic: frequency and context of Anglicism usage.
  • Word examples: cool, sorry, computer, handy, weekend, fastfood, shopping, party, internet, okay, email, live.
  • Rating scale: from “Never” to “Frequently”.

Dataset

The final dataset contains 30 survey responses. Respondents reported demographic information such as gender, age, native language, study program, and study-program language. They also answered questions about whether, how often, where, and why they use Anglicisms.

Variable Group Examples Purpose
Demographics gender, age, native language Describe participant background
Academic Background study program, program language Compare English/German/bilingual contexts
Usage Behavior daily frequency, settings, situations Measure how often and where Anglicisms are used
Motivations easy to use, social media, friends, language contact Explain why respondents use Anglicisms
Word-Level Ratings cool, sorry, computer, weekend, internet, email Analyze which Anglicisms are most integrated

Descriptive Results

The respondent group was balanced by gender and relatively young, with an average age of 26.47 years. The native-language distribution was diverse, including German, English, Turkish, and other languages.

30 Total survey responses analyzed.
26.47 Average respondent age.
96.67% Respondents who reported using Anglicisms.

Participant Background

  • Gender: 14 female respondents and 16 male respondents.
  • Age range: 19 to 35 years.
  • Native language: German 26.67%, English 16.67%, Turkish 23.33%, Other 33.33%.
  • Study-program language: English 40%, German 26.67%, bilingual 33.33%.

Anglicism Usage Patterns

The survey results show that Anglicisms are widely used among the respondents. Most participants reported using Anglicisms in everyday conversation, and many also reported usage in academic and professional settings.

Usage Context Percentage Interpretation
Everyday Conversation 86.67% Most common context for Anglicism usage.
Academic Settings 53.33% Anglicisms are common in university-related communication.
Professional Communication 30.00% Lower but still relevant workplace-related usage.
Friends 86.67% Social peer communication strongly supports Anglicism usage.
Home 66.67% Anglicisms are not limited to institutional contexts.

Motivations for Usage

The most common motivations were ease of use, peer influence, social media, and contact between German and English. This suggests that Anglicism usage is shaped by both practical communication needs and social/digital environments.

  • Easy to use: 63.33%.
  • My friends are using it: 63.33%.
  • Social media: 56.67%.
  • Language contact of German and English: 50.00%.
  • Trending and stylish: 26.67%.

Frequently Used Anglicisms

Several Anglicisms were reported as highly integrated into respondents’ daily language. Digital and communication-related words were especially common, which reflects the role of technology and internet culture in language contact.

Anglicism Frequently Used Interpretation
Internet 80.00% Most strongly integrated digital term.
Computer 70.00% Highly common technical Anglicism.
Okay 56.67% Common everyday communication marker.
Email 46.67% Frequently used in digital and academic communication.
Party 33.33% Common in social contexts.

Statistical Methods

The project used a combination of descriptive and inferential methods. Descriptive statistics summarized the survey responses, while t-tests and models were used to compare groups and evaluate whether study background could help explain different Anglicism usage frequencies.

Method Purpose Interpretation
Descriptive Statistics Summarize age, gender, language background, and usage frequencies Gives the basic structure of the sample
Density Plot Compare age distributions by gender Visualizes overlap and distributional differences
t-Test Compare Anglicism usage between English and German majors Tests whether group differences are statistically significant
Logistic Regression Model higher vs. lower Anglicism usage Predicts binary usage category from academic background
Decision Tree Classify higher vs. lower usage patterns Interpretable model for usage differences
Confusion Matrix Evaluate classification performance Shows correct and incorrect predictions

Modeling Approach

The modeling task focused on distinguishing lower-frequency and higher-frequency Anglicism usage. Usage was grouped into two classes: lower usage and higher usage. Bilingual students were excluded in one comparison to focus more clearly on the difference between English-major and German-major students.

Class Definition

  • Lower usage: 1 to 7 Anglicism uses per day.
  • Higher usage: 8 or more Anglicism uses per day.
  • Main comparison: English majors versus German majors.

Logistic regression was used to model the probability of higher Anglicism usage, while a decision tree was used for interpretable classification and comparison of usage groups.

Model Evaluation

The decision tree model was evaluated with a confusion matrix, precision, and recall. The model detected some structure in the usage patterns, but performance was moderate, which is expected given the small sample size.

Precision = 58.3% Accuracy of positive predictions for higher Anglicism usage.
Recall = 63.6% Ability to identify actual higher-usage cases.
p = 0.4475 t-test result showed no statistically significant group difference.

Confusion Matrix Summary

  • German majors: 4 lower-frequency and 4 higher-frequency predictions.
  • English majors: 3 lower-frequency and 9 higher-frequency predictions.
  • Interpretation: English majors showed more higher-frequency usage in the model output, but statistical testing did not confirm a significant mean difference.

Key Findings

The survey indicates that Anglicisms are widely used among university students. The strongest usage contexts are everyday conversation and communication with friends. Social media and ease of use are major motivations, and digital vocabulary such as internet, computer, email, and okay appears highly integrated.

  • Almost all respondents reported using Anglicisms.
  • Anglicisms were most common in everyday and social communication.
  • Social media and peer usage were major motivations.
  • English majors appeared to use Anglicisms more frequently in descriptive/model-based analysis.
  • The t-test did not find statistically significant evidence of a difference between English and German majors.

Limitations

The project is useful as an exploratory survey analysis, but the results should be interpreted carefully. The sample size is small, and the respondents are from a limited university context. The classification model therefore provides exploratory evidence rather than generalizable conclusions.

  • Only 30 survey responses were available.
  • The sample may not represent all university students.
  • Self-reported language usage may contain recall bias.
  • Grouping usage into lower and higher frequency simplifies the original response scale.
  • The t-test and decision tree results were not fully aligned, suggesting caution in interpretation.

Future Work

A stronger follow-up study could use a larger sample, more balanced major groups, and additional variables such as language proficiency, social media usage intensity, international background, and exposure to English-language courses.

  • Collect a larger and more representative survey sample.
  • Separate English majors, German majors, bilingual programs, and non-linguistic departments more clearly.
  • Use ordinal regression instead of collapsing usage into binary categories.
  • Include qualitative open-text responses for deeper linguistic interpretation.
  • Compare self-reported usage with real communication data where ethically possible.

Outcome

This project strengthened my ability to design and analyze survey data, apply statistical methods to linguistic questions, build interpretable models, and evaluate classification results using confusion matrices, precision, and recall.

It is a useful portfolio project because it combines research design, human-language data, survey analytics, statistical testing, and interpretable machine learning in one workflow.

Survey Analysis Research Methods Linguistics Anglicisms t-Test Logistic Regression Decision Tree Confusion Matrix Python Jupyter Notebook