Identification of similarities and clusters of bread baking recipes based on data of ingredients

  • Stefan Anlauf*
  • , Sebastian Dorl*
  • , Theresa Hirz
  • , Melanie Lasslberger
  • , Rudolf Grassmann
  • , Johannes Himmelbauer
  • , Stephan Winkler*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We define the similarity of bakery recipes using different distance calculations and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We compare different clustering algorithms (k-means, k-medoid, and hierarchical clustering) to find the optimal number of clusters. Besides the standard distance calculation (euclidean distance), we test three other distance metrics (hamming distance, manhattan distance, and cosine similarity). Additionally, we reduce the impact of raw materials used in large quantities by applying two different data transformations, namely the logarithm of the original data and the binarization of the original data. Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the hierarchical clustering on the logarithm of the original data, we can separate 704 recipes into three different clusters, achieving a Silhouette Score of 0.531. We visualize our results via dendrograms representing the recipes’ hierarchical separation into individual groups and sub-groups.
Original languageEnglish
Pages (from-to)753-762
Number of pages10
JournalInternational Journal of Food Engineering
Volume21
Issue number11
DOIs
Publication statusPublished - 1 Nov 2025

Keywords

  • baking recipes
  • clustering
  • ingredient
  • machine learning

Fingerprint

Dive into the research topics of 'Identification of similarities and clusters of bread baking recipes based on data of ingredients'. Together they form a unique fingerprint.

Cite this