Classification, the task of assigning objects to a given set of categories, is used in almost every field. One important sub-branch of classification consists of methods that learn classification functions from example data. The following chapter will provide an overview of the most basic concepts and methods of this type of data-driven classification. We will first highlight the basic ideas behind classification, along with some examples related to tourism. Thereafter, we will introduce measures of classification performance, which are necessary to direct data-driven training of classification functions and/or to evaluate classification results. As an essential part of this chapter, we will provide self-contained, yet stripped-down, descriptions of the most crucial data-driven classification methods. As such, we will focus on nearest neighbor classifiers, logistic regression, Naïve Bayes, decision trees and ensemble variants thereof, support vector machines, and finally, artificial neural networks. All of the concepts and methods will then be applied to a specific use case in an accompanying Jupyter notebook, demonstrating the practical implementation of these concepts and methods through the use of Python and the machine learning framework scikit-learn.
Original languageEnglish
Title of host publicationApplied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications
EditorsRoman Egger
Place of PublicationCham
Number of pages40
ISBN (Print)978-3-030-88389-8
Publication statusPublished - Jan 2022

Publication series

NameTourism on the Verge
VolumePart F1051
ISSN (Print)2366-2611
ISSN (Electronic)2366-262X


  • Classification
  • Decision tree
  • Gradient tree boosting
  • Logistic regression
  • Machine learning
  • Naïve Bayes
  • Neural network
  • Random forest
  • Support vector machine

Cite this