Civil drones have seen an increase in demand during recent years with applications ranging from warehouse inspections, film-making, to aircraft maintenance. This research aims to develop an intuitive and real-time hand gesture-based navigation system for indoor drone operations, focusing on visual inspection of aircraft airframes within hangars. The proposed system utilized a custom Convolutional Neural Network (CNN) model to interpret hand gestures captured via a live video feed and translate them into drone control commands. The model training dataset was generated from videos of volunteers demonstrating predetermined hand gestures, resulting in 1000 images distributed across 10 hand gesture classes. The custom CNN model, comprised of five convolutional layers and three dense layers, was trained on the generated dataset using Google Colaboratory platform to accelerate the training process. The model achieved an accuracy of 95% with an inference time of 23ms, demonstrating its suitability for real-time applications. The implementation of the model into the drone navigation system developed in Python demonstrated seamless real-time performance with 24ms latency.