Recommendation systems are important component of many applications. They help users discover content and products that they may be interested. In this article we will be developing a Recipe Recommender in Python that suggests recipes based on a user’s dietary restrictions and preferences.
Data Collection:
Automatic Email Sender in Python – Python Code to Send Email
The first step in developing recommendation system is to collect data. In this case we need a dataset of recipes. We can use the Recipe1M+ dataset which contains one million recipes.
Data Preprocessing:
Once we have dataset we need to preprocess it to extract the necessary information. We need to extract the ingredients and the dietary restrictions associated with each recipe. We also need to extract any other relevant information, such as the recipe name and instructions.
Feature Engineering:
Next step is to engineer features from the preprocessed data. We can use techniques like word embedding and TF-IDF to create vectors that represent each recipe. These vectors can be used to compute the similarity between different recipes.
User Preferences:
We also need to collect information about the user’s dietary restrictions and preferences. This can be done by asking the user to input their preferences or by using data from social media profiles.
Similarity Calculation:
We can use vectors created in step 3 to calculate the similarity between each recipe and the user’s preferences. We can use cosine similarity to compute the similarity between two vectors.
Recommendation Generation:
Based on the similarity scores we can generate a list of recommended recipes for the user. We can use techniques like collaborative filtering and content based filtering to generate the recommendations.
User Feedback:
Finally we need to collect feedback from user to improve the recommendation system. We can use techniques like A/B testing to evaluate the performance of the system and make changes accordingly.
Here are the steps and code for doing so:
- Import the necessary libraries, such as pandas, numpy, sklearn, etc.
- Load a dataset of recipes with ingredients and other metadata, such as RecipeNLG1 or Recipe1M+2.
- Preprocess the data by removing duplicates, missing values, outliers, etc. You can use a package like recipes3 to create a recipe for preprocessing data.
- Encode the categorical features, such as cuisine type, dietary restrictions, etc., using one-hot encoding or label encoding.
- Create a user profile for each user based on their preferences and ratings of recipes. You can use collaborative filtering4 to find similar users and items based on their interactions.
- Train a model to learn the embeddings of recipes and users based on their features and ratings. You can use a neural network or matrix factorization approach for this task.
- Use the trained model to generate recommendations for each user by finding the recipes that have the highest similarity or score with their profile.
- Evaluate the performance of the model using metrics such as precision, recall, F1-score, etc.
Here is an example of code that implements these steps using Python:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.models import Model
from keras.layers import Input, Embedding, Dot, Flatten
from keras.optimizers import Adam
# Load data
df = pd.read_json("recipes.json")
df.head()
# Preprocess data
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)
df = df[df["ingredients"].apply(len) > 0]
# Encode categorical features
encoder = OneHotEncoder(handle_unknown="ignore")
encoder.fit(df[["cuisine", "dietary_restrictions"]])
X_cat = encoder.transform(df[["cuisine", "dietary_restrictions"]]).toarray()
# Create user profile
user_df = pd.read_json("user_ratings.json")
user_df.head()
user_profile = user_df.groupby("user_id").agg({"recipe_id": list,
"rating": list})
user_profile.head()
# Split data into train and test sets
X_train_ids, X_test_ids = train_test_split(df["id"], test_size=0.2,
random_state=42)
X_train_cat = X_cat[X_train_ids.index]
X_test_cat = X_cat[X_test_ids.index]
# Define model parameters
n_users = len(user_profile)
n_recipes = len(df)
n_factors = 10 # number of latent factors
# Define model inputs
user_input = Input(shape=(1,))
recipe_input = Input(shape=(X_cat.shape[1],))
# Define embedding layers for users and recipes
user_embedding = Embedding(n_users + 1,
n_factors,
embeddings_initializer="he_normal",
embeddings_regularizer="l2")(user_input)
recipe_embedding = Embedding(n_recipes + 1,
n_factors,
embeddings_initializer="he_normal",
embeddings_regularizer="l2")(recipe_input)
# Define dot product layer to compute similarity between users and recipes
dot_product = Dot(axes=1)([user_embedding,
recipe_embedding])
# Define output layer to predict rating
output = Flatten()(dot_product)
# Define model
model = Model(inputs=[user_input,
recipe_input],
outputs=output)
# Compile model
model.compile(loss="mean_squared_error",
optimizer=Adam(lr=0.001))
# Fit model
model.fit([user_profile.index.values,
X_train_cat],
user_profile["rating"],
batch_size=64,
epochs=10,
validation_split=0.1)
# Evaluate model
model.evaluate([user_profile.index.values,
X_test_cat],
user_profile["rating"])
# Generate recommendations
def recommend(user_id):
# Get the user profile
user_ratings = user_profile.loc[user_id]
# Get the rated recipes
rated_recipes = set(user_ratings["recipe_id"])
# Get all recipes
all_recipes = set(df["id"])
# Get unrated recipes
unrated_recipes = all_recipes - rated_recipes
# Predict ratings for unrated recipes
predictions = []
for recipe_id in unrated_recipes:
recipe_features = X_cat[df[df["id"] == recipe_id].index]
prediction = model.predict([np.array([user_id]),
recipe_features])
predictions.append((recipe_id, prediction))
# Sort predictions by rating
predictions.sort(key=lambda x: x[1], reverse=True)
# Return top 10 recommendations
recommendations = predictions[:10]
# Print recommendations
print(f"Recommendations for user {user_id}:")
for recipe_id, rating in recommendations:
recipe_name = df[df["id"] == recipe_id]["name"].values[0]
print(f"{recipe_name} ({rating:.2f})")