Explore my cutting-edge computer vision solutions addressing real-world challenges in smart industries such as autonomous systems, healthcare, agriculture, and more.

Computer Vision Real-World Applications

Classification Stack: CNN, ViT

CIFAR-10 Image Classification

View on GitHub

Using Convolutional Neural Networks (CNNs) to classify images of different types of objects from the CIFAR-10 dataset.

Object Detection Stack: 2D, 3D

2D Object Detection with YOLOv5

View on GitHub

Multimodality (VLMs: GANs, VAEs, Diffusers …)

AI Assisted Image Editing and Manipulation

View on GitHub Demo on HF Spaces

This is an AI Image Editing and Manipulation tool for Image creation, editing and manipulation.

Multimodal AI Storyteller

View on GitHub

Generate text and audio stories from image.

Motion & Video Analysis: Tracking & Flow

2D Object Tracking with OpenCV and Deep Learning View on GitHub

Perception

2D Object Detection with YOLO for Autonomous Vehicles

View on GitHub

Apply the YOLO (You Only Look Once) model series to detect and classify objects such as pedestrians, vehicles, and traffic signs in real-time. It uses real-world camera and LIDAR data from Lyft 3D Object Detection dataset for autonomous vehicles.

Road Segmentation with Fully Convolutional Networks (FCN)

View on GitHub

Implement an Fully Convolutional Networks (FCN) model to perform pixel-wise classification, enabling the vehicle to distinguish drivable road areas from obstacles. It gets real-world visual data from KITTI Dataset.

Realtime Multi-Object Tracking with DeepSORT

View on GitHub

Integrate the DeepSORT model to track the trajectory of detected objects across video frames, maintaining consistent identification.

3D Object Detection with SFA3D and KITTI

View on GitHub

Utilize the SFA3D model to detect objects in 3D space using LiDAR data from KITTI dataset, crucial for understanding the vehicle’s surroundings.

3D Data Visualization and Homogeneous Transformations

View on GitHub

Visualize and manipulate 3D point cloud data from LiDAR sensors (KITTI dataset), applying homogeneous transformations to align data from multiple sensors.

Camera to Bird’s Eye View Projection with UNetXST

View on GitHub

Develop a model to transform camera images into a bird’s eye view, aiding in better spatial understanding for navigation.

Multi-Task Learning with Multi-Task Attention Network (MTAN)

View on GitHub

Implement a Multi-Task Attention Network (MTAN) on CityScapes Dataset, to simultaneously perform tasks like road segmentation and object detection, improving computational efficiency.

Multimodal Sensor Fusion with GPS, IMU, and LiDAR for Vehicule Localization.

View on GitHub

This project involved integrating data from multiple sensors to accurately determine a vehicle’s position and motion on the roadway. The system uses techniques such as Kalman filtering to combine inputs from GPS, IMU, and LiDAR, enhancing the precision of state estimation critical for autonomous driving applications.

Additional Resources: Linear/Non Linear KF Implementation.

Depth Perception for Obstacle Detection on the Road

View on GitHub

Implemented stereo depth estimation using Python and OpenCV on CARLA simulator images to calculate collision distances in a driving scenario.

Additional Resources: Learn core concepts here.

Visual Odometry (VO) for Self-Driving Car Location

View on GitHub

This is a visual odometry system that estimates the vehicle’s trajectory using realtime visual data captured by its (monocular) camera.

Edge AI

Edge AI involves processing data locally on devices, reducing inference cost, offering faster decision-making, and enhanced security.

Real-time Segmentation Deployment with Qualcomm AI Hub

View on GitHub Open notebook in Colab

This case study demonstrates how to deploy a semantic segmentation model optimized for edge devices using Qualcomm AI Hub. The example leverages FFNet, a model tailored for efficient edge-based semantic segmentation, tested on the Cityscapes dataset.

Industry Applications: Autonomous Driving, Augmented Reality, and Mobile Robotics

Case Studies

Autonomous Systems (Self-Driving Cars/ADAS, Robotics, UAVs)

Self-Driving Car Environment Perception

View on GitHub

Self-Driving Car foundational perception stack, which extracts useful information from its surroudings and perform complex tasks in order to drive safely through the world

End-to-End Self-Driving Car Behavioral Cloning

View on GitHub

End-to-End self-driving car behavioral cloning implementation based on NVIDIA End-to-End Learning paper using computer vision, deep learning, and realtime visual data from Udacity Self-Driving Car simulator.

Smart Cities

⚡ SmartMeterSim: IoT Smart Meter Simulation

View on GitHub Demo on HF Spaces

⚡ SmartMeterSim is a production-ready IoT solution for real-time energy monitoring and optimization for smart grids, buildings, and Edge-to-Cloud applications.

Edge AI Resources