Data analysis of spaghetti with tomato sauce recipes – Using food blogs as a media and market intelligence source

Are there unexplored data sources beyond surveys?

Yes. By using techniques such as web scraping to collect data, blogs become valuable sources of information about consumer tastes and trends. This analysis of a simple recipe like spaghetti with tomato sauce reveals the preferences of the users who follow different blogs and shows how the recipes vary over time.

Methodology

Python and R for web scraping, data engineering, data analysis, and visualizations.

Most of the work was dedicated to data cleaning and standardization. For example: “pomodoro,” “pomodori,” “tomato,” and “tomato sauce” are all treated as the same ingredient.

Visualization: Before/After data cleaning

From this:

To this:

Result: Network Chart of the Ingredients

What does this chart tell us?

The chart above shows pairs of ingredients that appear together at least 20 times.
The dataset includes recipes from two well-known American cooking websites, Food.com and Allrecipes: 10 recipes from Allrecipes, 8 from Food.com, and 11 recipes from 11 popular Italian food blogs.

The bubbles located in the center indicate ingredients that are used in almost all recipes and are considered the base of the classic dish: spaghetti, onion or garlic, basil, peeled tomatoes and tomato paste, salt, and olive oil.

The bubbles located toward the top indicate ingredients such as mushrooms, green peppers, ground beef, and red wine, which appear only in Allrecipes and are not commonly used in traditional versions of the recipe.

Sugar is added mainly by users of Allrecipes, but also appears in Food.com recipes and in one Italian recipe.

Various types of tomatoes are used. Italian recipes show a preference for fresh tomatoes and peeled tomatoes, while Allrecipes tends to use tomato sauce. Some recipes simply state “tomatoes” without specifying the type.

Actionable insights: “If you buy X, you likely need Y.” This can help supermarkets decide which products to place next to each other on the shelf.

Common ingredients and key differences

The recipes published on these blogs share classic ingredients such as various types of tomatoes, salt, garlic, and onion, but there are also some surprising ingredients, such as ready-made jarred tomato sauce.

Defining what “spaghetti al pomodoro” means

Spaghetti al pomodoro is a classic Italian recipe, simple and inexpensive, because once the sauce is prepared, it can be used as a base for many other delicious dishes. The situation changes when looking for “spaghetti al pomodoro” in the United States. I began my search on Allrecipes.com and noticed that entering the keywords “spaghetti al pomodoro” returned recipes with clams, meatballs, various meats, and even carbonara. In the end, I manually selected only the recipes that matched the idea of “spaghetti with tomato sauce.” I followed the same process on Food.com.

Actionable insights: This illustrates how blog search functions may not return precise results for the keywords used or what the users are looking for. More refined searches are often needed; perhaps the search functions can be improved.

Sugar

Another difference is the use of sugar, which seems to be more common in the recipes from Allrecipes.


Why Sugar?

To understand this recipe, I put my knowledge and work experience as a cook into action. Reading the execution of recipes on Allrecipes, I realized that all of them don’t include a step that is fundamental in Italy: browning the onion or garlic in a pan with olive oil before adding the subsequent ingredients—in culinary terms, this would be the Maillard reaction. In the recipes analyzed instead, all the ingredients are put together in the pan at the same time; this means that the Maillard reaction, so important for extracting the flavor of the ingredients, is omitted.

This could explain why 7 out of 10 recipes then add so many other ingredients to give more flavor and also sugar to make the sauce sweeter; but the Maillard reaction achieves the same result naturally.

Actionable insights: Are Allrecipes’ recipes addressed to less experienced cooks?

The Number of Ingredients Used

The number of ingredients present in the recipes ranges from a minimum of 3 to a maximum of 14. A huge difference, if we think about the simplicity of this recipe.

More Ingredients = Better Recipe? Spoiler Alert: No.

Not all the blogs examined have a rating system, so I checked the ratings on Allrecipes, with an average of about 130 ratings. But there are no significant positive correlations between the number of ingredients and the number of stars or ratings. In fact, the correlation between the number of ingredients and the number of stars is negative.

This would be an element to consider for recipe authors whose target audience is novice cooks. Sauce producers might use this to better understand how to sell their sauces to different geographical clienteles, but also to commercialize a fresher version of Italian tomato sauce. This data is consistent with market trends that want customers to be more attentive to products with shorter ingredient lists on labels to avoid processed foods.

In fact, among the recipes there are surprising ingredients, such as tomato sauce already prepared from a jar. In this case the recipe is “Homemade Spaghetti Sauce with Ground Beef” and this indicates that tomato is used as a “base” for the preparation of other more complex recipes such as meat-based ones.

Actionable insights: Produce sauces to be used as a base for other preparations such as for fish, for meat sauce, for lasagna, for pizza?

Reproducibility of the Analysis and Various Uses

Actionable insights: Beyond spaghetti, any other category could be analyzed. Social media: Enrich with sentiment from Instagram/TikTok This was a one-time analysis, but it can become a real-time dashboard constantly updated to monitor trends. Early warning system for emerging trends.

Conclusion. From Spaghetti al Pomodoro to the Market

In an era of abundance of information and data, the difference lies in the extraction and interpretation of information, from unstructured data to clean data to concise analyses and actionable intelligence. One challenge in writing this post was dealing with the abundance of information gathered, which required me to carefully filter the content and tailor it to a specific audience.

If your company wants to extract insights from media/blogs/reviews, let’s talk about it. LinkedIn, Upwork, socials.

Leave a comment