Accurately Predicting User Registration in Highly Unbalanced Real-World Datasets from Online News Portals

Eva-Maria Spitzer, Oliver Krauss, Andreas Stöckl

Research output: Chapter in Book/Report/Conference proceedingsConference contributionpeer-review

1 Citation (Scopus)


Getting visitors to register is a crucial factor in marketing for online news portals. Current approaches are rule-based by awarding points for specific actions [3]. Finding efficient rules can be challenging and depends on the specific task. Registration is generally rare compared to regular visitors, leading to highly imbalanced data. We analyze different supervised learning classification algorithms under consideration of the data imbalance. As case study, we use anonymized real-world data from an Austrian newspaper outlet containing the visitor’s session behavior with around 0.1% registrations over all visits. We identify an ensemble approach combining the Balanced Random Forest Classifier and the RUSBoost Classifier correctly identifying 76% of registrations over five independent data sets.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications - 33rd International Conference, DEXA 2022, Proceedings
EditorsChristine Strauss, Alfredo Cuzzocrea, Gabriele Kotsis, Ismail Khalil, A Min Tjoa
Place of PublicationCham
Number of pages14
ISBN (Print)9783031124228
Publication statusPublished - 29 Jul 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13426 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


  • Imbalanced data
  • Label prediction
  • Lead scoring


Dive into the research topics of 'Accurately Predicting User Registration in Highly Unbalanced Real-World Datasets from Online News Portals'. Together they form a unique fingerprint.

Cite this