TY - JOUR
T1 - Identification of similarities and clusters of bread baking recipes based on data of ingredients
AU - Anlauf, Stefan
AU - Dorl, Sebastian
AU - Hirz, Theresa
AU - Lasslberger, Melanie
AU - Grassmann, Rudolf
AU - Himmelbauer, Johannes
AU - Winkler, Stephan
N1 - Publisher Copyright:
© 2023 Walter de Gruyter GmbH, Berlin/Boston.
PY - 2025/11/1
Y1 - 2025/11/1
N2 - We define the similarity of bakery recipes using different distance calculations and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We compare different clustering algorithms (k-means, k-medoid, and hierarchical clustering) to find the optimal number of clusters. Besides the standard distance calculation (euclidean distance), we test three other distance metrics (hamming distance, manhattan distance, and cosine similarity). Additionally, we reduce the impact of raw materials used in large quantities by applying two different data transformations, namely the logarithm of the original data and the binarization of the original data. Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the hierarchical clustering on the logarithm of the original data, we can separate 704 recipes into three different clusters, achieving a Silhouette Score of 0.531. We visualize our results via dendrograms representing the recipes’ hierarchical separation into individual groups and sub-groups.
AB - We define the similarity of bakery recipes using different distance calculations and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We compare different clustering algorithms (k-means, k-medoid, and hierarchical clustering) to find the optimal number of clusters. Besides the standard distance calculation (euclidean distance), we test three other distance metrics (hamming distance, manhattan distance, and cosine similarity). Additionally, we reduce the impact of raw materials used in large quantities by applying two different data transformations, namely the logarithm of the original data and the binarization of the original data. Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the hierarchical clustering on the logarithm of the original data, we can separate 704 recipes into three different clusters, achieving a Silhouette Score of 0.531. We visualize our results via dendrograms representing the recipes’ hierarchical separation into individual groups and sub-groups.
KW - baking recipes
KW - clustering
KW - ingredient
KW - machine learning
UR - https://www.scopus.com/pages/publications/85187332977
U2 - 10.1515/ijfe-2023-0032
DO - 10.1515/ijfe-2023-0032
M3 - Article
AN - SCOPUS:85187332977
SN - 1556-3758
VL - 21
SP - 753
EP - 762
JO - International Journal of Food Engineering
JF - International Journal of Food Engineering
IS - 11
ER -