a diagram of a RAG application deployed on GKE
a diagram of a RAG application deployed on GKE
Scalable RAG with GKE and Qdrant

Have you ever struggled to locate that perfect piece of code you wrote months ago? In this article, I will guide you on how to create an LLM application using LlamaIndex and Qdrant that will allow you to interact with your GitHub repositories, making it easier than ever to find forgotten code snippets. We’ll deploy the application on Google Kubernetes Engine (GKE) with Docker and FastAPI and provide an intuitive Streamlit UI for sending queries

This article was featured in the GKE Newsletter (This week in GKE, ISSUE#19, 12 July 2024) and LlamaIndex Newsletter

a diagram of a RAG application deployed on AWS
a diagram of a RAG application deployed on AWS
LLM App with AWS Lambda and Qdrant

In this post, I explain how to build a serverless application to perform semantic search over academic papers using AWS Lambda and Qdrant. I used LangChain and OpenAI’s embeddings to create vector representations of document chunks and store them in Qdrant. A simple shell script helps build and push the Docker image to AWS ECR and deploy it as an AWS Lambda function. After testing the Lambda function, I created an API Gateway endpoint and built a Streamlit application to interact with the Lambda function

Articles

List of published articles

a photo of a multimodal RAG application
a photo of a multimodal RAG application
Multimodal LLM with Qdrant and Gemini

Are you feeling hungry and craving your favorite recipes? Imagine having a YouTube playlist filled with your top recipe videos, complete with image frames and detailed descriptions. In this article, I guide you through the process of extracting videos and descriptions from a playlist, capturing images as frames, and storing everything in a Qdrant cluster. You’ll have two separate collections: one for text and one for images. It’s a perfect blend of technology and culinary delight!

This article was featured in the LlamaIndex Newsletter

RAG App with AWS CDK, Qdrant and LlamaIndex

Infrastructure as Code (IaC) is a modern technique for managing and provisioning infrastructure resources through code. Rather than manually configuring these resources, you specify them in machine-readable configuration files. AWS offers two IaC tools: AWS CloudFormation and AWS CDK. CloudFormation provisions AWS resources using templates written in JSON or YAML, while the AWS CDK allows you to provision resources using familiar programming languages like Python. The CDK acts as an abstraction layer that simplifies the creation of CloudFormation templates.

This article was featured in the LlamaIndex Newsletter

Agentic RAG Using Claude, LlamaIndex, and Milvus

As AI systems continue to evolve rapidly, relying solely on large language models (LLMs) is no longer sufficient to meet the diverse needs of today’s industries. These increasing challenges require the development of more complex architectures that can solve problems more efficiently and effectively. At the Unstructured Data Meetup hosted by Zilliz, Bill Zhang, Director of Engineering at Zilliz, introduced the concept of Compound AI Systems, which was featured in the Berkeley AI Research (BAIR) blog. This modular approach integrates multiple components to handle various tasks rather than relying on a single AI model, delivering more tailored and efficient results. You can watch Bill’s presentation on the Zilliz YouTube channel.

RAG with Milvus, LlamaIndex and PII Modules

In the fast-paced area of artificial intelligence, Retrieval Augmented Generation (RAG) has emerged as a powerful approach to enhance the capabilities of generative models such as OpenAI’s GPT series and Google’s Gemini. However, with great potential comes significant responsibility, particularly when it comes to safeguarding sensitive data and ensuring compliance with privacy regulations. As organizations increasingly rely on AI-driven solutions, understanding the security implications of these technologies is crucial. Implementing strong security measures that not only protect data but also build user trust is essential for production-ready RAG applications.

Multimodal Bill Scan System with AWS Services

Scanning documents and extracting key information can now be accomplished with high accuracy using multimodal models like Claude 3 Sonnet. In this article, I will walk you through a simple system where a user uploads a scanned bill to an AWS S3 bucket. Several AWS Lambda functions are then triggered in sequence, performing tasks from text extraction to data insertion into DynamoDB. Finally, the user receives an email notification once the data is available in the table.

Building a Multimodal LLM Application with PyMuPDF4LLM

Extracting text from PDFs is a crucial and often challenging step in many AI, and LLM (Large Language Model) applications. High-quality text extraction plays a key role in improving downstream processes, such as tokenization, embedding creation, or indexing in a vector database, enhancing the overall performance of the application. PyMuPDF is a popular library for this task due to its simplicity, high speed, and reliable text extraction quality.

In this blog, we will explore a recently launched free library by Artifex (the creators of PyMuPDF) called PyMuPDF4LLM. This new library is designed to simplify text extraction from PDFs and is specifically developed for LLM and Retrieval-Augmented Generation (RAG) applications.

This article was featured in the LlamaIndex Newsletter

Evaluating Safety & Alignment of LLM in Specific Domains

Recent advancements in AI have given rise to sophisticated Large Language Models (LLMs) with potentially transformative impacts across high-stakes domains such as healthcare, financial services, and legal industries. Although these models provide significant advantages, their use in critical decision-making requires thorough evaluation to guarantee safety, accuracy, and ethical standards. Serious concerns surrounding these models' accuracy, security, and fairness must be addressed before fully embracing AI in such sensitive environments.

Challenges in Structured Document Data Extraction at Scale with LLMs

One of the core challenges in applying large language models (LLMs) lies in the data processing phase, particularly when handling diverse data formats. You can typically classify data into three types: structured, semi-structured, or unstructured. Since LLMs primarily work with text, they are often applied to unstructured data unless the data is already organized into a relational table.

In a recent webinar, Tim Spann, Principal Developer Advocate at Zilliz, introduced Unstract, an open-source platform designed to streamline the extraction of unstructured data and transform it into structured formats. This tool aims to simplify data management by automating the structuring process.

In this blog, we’ll dive into the primary challenges of structured document data extraction, as outlined by Shuveb Hussain, Co-founder and CEO of Unstract. We'll also explore how Unstract tackles various scenarios, including its integration with vector databases like Milvus, to bring structure to previously unmanageable data.

When you set up a vector database, plenty of default parameters are working behind the scenes, ready for fine-tuning if you need them. In this blog, I will explain how to configure some of these options, like HNSW and Quantization settings, and explore techniques like hybrid search (sparse and dense vectors) and semantic caching (saves previous questions and answers), this last one helping you save both time and money by matching current queries with previous ones. With these tools, you’ll be able to strike the right balance between speed and accuracy while keeping costs optimized.

It will be divided into two parts, the current blog, which will explain the theoretical concepts, and a second blog with a practical example, which you can use to fine-tune the parameters.

Balancing Accuracy and Speed with Qdrant Hyperparameters, Hybrid Search and Semantic Caching
Learn Llama 3.2 and How to Build a RAG Pipeline with Llama and Milvus

In recent months, Meta has made impressive advances in the open-source community, releasing a series of powerful models—Llama 3, Llama 3.1, and Llama 3.2—in just six months. By providing high-performance models to the public, Meta is narrowing the gap between proprietary and open-source tools, offering developers valuable resources to push the boundaries of their projects. This dedication to openness is a game-changer for innovation in AI.

At a recent Unstructured Data Meetup hosted by Zilliz, Amit Sangani, Senior Director of AI Partner Engineering at Meta, discussed the rapid evolution of the Llama models since 2023, recent advancements in open-source AI, and the architecture of these models. He highlighted not only the benefits for developers but also how these models serve Meta and various modern AI applications powered by them, like Retrieval Augmented Generation (RAG).can use to fine-tune the parameters.

Unlocking the Power of Vector Quantization: Techniques for Efficient Data Compression and Retrieval

In today’s data-driven world, managing massive datasets, particularly high-dimensional data like vector embeddings used in various AI applications, has presented significant storage, computation, transmission, and retrieval challenges. Data compression has become a critical strategy for reducing information size without sacrificing utility, enabling faster processing and cost-effective storage.

One compression technique that has proven invaluable in this area is Vector Quantization (VQ). At its core, vector quantization clusters similar data points in a high-dimensional space and represents each cluster with a centroid vector. This approach significantly reduces data size while preserving essential features, making it ideal for applications like image compression, audio processing, and machine learning. In this blog, we’ll explore the principles and techniques of vector quantization and how it enables smarter data compression and efficient retrieval in modern AI-driven systems.