Your blog post

Blog post description.

7/9/20243 min read

worm's-eye view photography of concrete building
worm's-eye view photography of concrete building

fgdfgfdfgdfgfdfgdfgfdfgdfgfd

vfgdfgfd

fgdfgfdfgdfgfdfgdfgfdfgdfgfd

vfgdfgfd

fgdfgfdfgdfgfdfgdfgfdfgdfgfd

vfgdfgfd

fgdfgfdfgdfgfdfgdfgfdfgdfgfd

vfgdfgfd

fgdfgfdfgdfgfdfgdfgfdfgdfgfd

vfgdfgfdCreating a DataFrame with the Recipes

The first step in building our recipe recommendation system is to extract information from a YouTube playlist. I used the playlist from the Food Network, which contains the ingredients of the recipes in the video description. I created a DataFrame containing the URL, Title, and Description of each recipe video.

# Initialize lists to store data
vid_title_list = []
vid_description_list = []
url_list = []

# Get videos from the playlist
videos = scrapetube.get_playlist("PLpfv1AIjenVMmT7iRx6Nwu6uG6A9gSD0j")

# Extract all URLs
for video in videos:
video_link = "https://www.youtube.com/watch?v=" + str(video['videoId'])
url_list.append(video_link)

# Extract video details
for i in url_list[:3]:
try:
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info_dict = ydl.extract_info(i, download=False)
vid_title = info_dict.get('title')
vid_description = info_dict.get('description')

vid_title_list.append(vid_title)
vid_description_list.append(vid_description)
except Exception as e:
print(f"Error fetching details for {i}: {e}")
time.sleep(1) # Adding delay to avoid rate limiting

# Create the data dictionary
data = {
"URL": url_list[:3],
"Title": vid_title_list,
"Description": vid_description_list
}

# Create the DataFrame
df = pd.DataFrame(data)

Once the DataFrame has been created, I generated a text file for each description. This text will go to the text collection in the Qdrant cluster.

# Define the output folder
output_folder = 'output_folder'

# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)

# Iterate through each row in the dataframe
for index, row in df.iterrows():
# Define the file name and path
file_name = f"Recipe {row['Title']}.txt"
file_path = os.path.join(output_folder, file_name)

# Write the description to the text file
with open(file_path, 'w') as file:
file.write(row['Description'])

print("Files have been saved successfully.")

Extracting Key Frames from Recipe Videos

To enhance our recipe recommendation system with visual information, I extracted key frames from each video. This allows users to visualize the steps in the recipe preparation process. I set the frame extraction rate to 0.03 frames per second (fps), which means it captures one image approximately every 33 seconds. This rate provides a good balance between capturing important steps and managing storage efficiently.

# Create directories for downloads and save images with recipes
download_folder = 'downloaded_videos'

os.makedirs(download_folder, exist_ok=True)

# Function to download a video
def download_video(url, output_path):
ydl_opts = {
'outtmpl': output_path
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])

# Function to check if images already exist for a video
def images_exist(title):
return any(fname.startswith(title) for fname in os.listdir(output_folder))

# Iterate over the URLs in the dataframe and download each video
for idx, row in df.iterrows():
url = row['URL']
title = row['Title']
video_filename = os.path.join(download_folder, f"{title}.mp4.webm")

try:
# Check if the video already exists
if not os.path.exists(video_filename):
download_video(url, video_filename)
print(f"Downloaded {title}")
else:
print(f"Video {title} already exists. Skipping download.")

# Check if images already exist
if not images_exist(title):
# Load the video and save frames as images
clip = VideoFileClip(video_filename)
clip.write_images_sequence(os.path.join(output_folder, f"{title}_frame_%04d.png"), fps=0.03)
print(f"Images saved for {title}")
else:
print(f"Images for {title} already exist. Skipping image extraction.")

except Exception as e:
print(f"Error processing {title}: {e}")

print("Process completed.")

Creating Collections and Vector Store for Multimodal Recipe Data

Once our data is stored in the same folder, we need to create two separate collections: one for the text and one for the images. This setup allows for fast and accurate similarity searches across both modalities. Fortunately, the MultiModalVectorStoreIndex from LlamaIndex automatically manage the separation of text and image data from the same folder. As embedding model I used embedding-001 and for llm gemini-pro-vision.

# Initialize Qdrant client
client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

# Create a collection for text data
client.create_collection(
collection_name="text_collection",
vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

# Create a collection for image data
client.create_collection(
collection_name="image_collection",
vectors_config=VectorParams(size=512, distance=Distance.COSINE)
)

# Initialize Collections
text_store = QdrantVectorStore(
client=client, collection_name="text_collection"
)

image_store = QdrantVectorStore(
client=client, collection_name="image_collection"
)

# Set Global settings
embed_model = GeminiEmbedding(
model_name="models/embedding-001", api_key=GOOGLE_API_KEY)

llm = GeminiMultiModal(
model_name="models/gemini-pro-vision", api_key=GOOGLE_API_KEY)

Settings.llm = llm
Settings.embed_model = embed_model

# Create the MultiModal index

storage_context = StorageContext.from_defaults(
vector_store=text_store, image_store=image_store
)

documents = SimpleDirectoryReader("/content/output_folder").load_data()

# Create Multimodal Index
index = MultiModalVectorStoreIndex.from_documents(
documents,
storage_context=storage_context)

Get Recipes!

Finally, we can retrieve our recipes and images so that we can visualize the ingredients and some preparation steps, which allow us to evaluate if the recipe is what we are looking for or we want to look for a different one. The code will get the top 1 text and the top 4 images based on similarity, but it can be customized to get more results. It will plot the name of the photo and the frame number, as I named each photo with the title and the frame number to recognized them properly