Explore my cutting-edge computer vision solutions addressing real-world challenges in smart industries such as autonomous systems, healthcare, agriculture, and more.

Computer Vision Real-World Applications

Computer Vision Challenge 🏆

View on GitHub Demo on HF Spaces

A hands-on collection of computer vision projects designed for all skill levels—beginners to experts.

Challenge Levels

  • Level 0 - Beginner: Basics of image processing and OpenCV.
  • Level 1 - Intermediate: Build deep learning models with PyTorch/TensorFlow.
  • Level 2 - Hero: Explore image generation and inpainting with LVMs.
  • Level 3 - Advanced (ongoing): Benchmark video models for action recognition.
  • Level 4 - Expert (ongoing): Fine-tune VLMs and LVMs for custom datasets.
  • Level 5 - Master (ongoing): Build multimodal AI systems combining vision, text, and more.

Perception

2D Object Detection with YOLO for Autonomous Vehicles

View on GitHub

Apply the YOLO (You Only Look Once) model series to detect and classify objects such as pedestrians, vehicles, and traffic signs in real-time. It uses real-world camera and LIDAR data from Lyft 3D Object Detection dataset for autonomous vehicles.

Road Segmentation with Fully Convolutional Networks (FCN)

View on GitHub

Implement an Fully Convolutional Networks (FCN) model to perform pixel-wise classification, enabling the vehicle to distinguish drivable road areas from obstacles. It gets real-world visual data from KITTI Dataset.

Realtime Multi-Object Tracking with DeepSORT

View on GitHub

Integrate the DeepSORT model to track the trajectory of detected objects across video frames, maintaining consistent identification.

3D Object Detection with SFA3D and KITTI

View on GitHub

Utilize the SFA3D model to detect objects in 3D space using LiDAR data from KITTI dataset, crucial for understanding the vehicle’s surroundings.

3D Data Visualization and Homogeneous Transformations

View on GitHub

Visualize and manipulate 3D point cloud data from LiDAR sensors (KITTI dataset), applying homogeneous transformations to align data from multiple sensors.

Camera to Bird’s Eye View Projection with UNetXST

View on GitHub

Develop a model to transform camera images into a bird’s eye view, aiding in better spatial understanding for navigation.

Multi-Task Learning with Multi-Task Attention Network (MTAN)

View on GitHub

Implement a Multi-Task Attention Network (MTAN) on CityScapes Dataset, to simultaneously perform tasks like road segmentation and object detection, improving computational efficiency.

Multimodal Sensor Fusion with GPS, IMU, and LiDAR for Vehicule Localization.

View on GitHub

This project involved integrating data from multiple sensors to accurately determine a vehicle’s position and motion on the roadway. The system uses techniques such as Kalman filtering to combine inputs from GPS, IMU, and LiDAR, enhancing the precision of state estimation critical for autonomous driving applications.

Additional Resources: Linear/Non Linear KF Implementation.

Depth Perception for Obstacle Detection on the Road

View on GitHub

Implemented stereo depth estimation using Python and OpenCV on CARLA simulator images to calculate collision distances in a driving scenario.

Additional Resources: Learn core concepts here.

Visual Odometry (VO) for Self-Driving Car Location

View on GitHub

This is a visual odometry system that estimates the vehicle’s trajectory using realtime visual data captured by its (monocular) camera.

Edge AI

Edge AI involves processing data locally on devices, reducing inference cost, offering faster decision-making, and enhanced security.

Real-time Segmentation Deployment with Qualcomm AI Hub

View on GitHub Open notebook in Colab

This case study demonstrates how to deploy a semantic segmentation model optimized for edge devices using Qualcomm AI Hub. The example leverages FFNet, a model tailored for efficient edge-based semantic segmentation, tested on the Cityscapes dataset.

Industry Applications: Autonomous Driving, Augmented Reality, and Mobile Robotics

⚡ SmartMeterSim: Edge-to-Cloud <=> Cloud-to-Edge Energy Monitoring

View on GitHub Demo on HF Spaces

⚡ SmartMeterSim is a production-ready IoT solution for real-time energy monitoring and optimization for smart grids, buildings, and Edge-to-Cloud applications.

Case Studies

Self-Driving Car Environment Perception

View on GitHub

Self-Driving Car foundational perception stack, which extracts useful information from its surroudings and perform complex tasks in order to drive safely through the world

End-to-End Self-Driving Car Behavioral Cloning

View on GitHub

End-to-End self-driving car behavioral cloning implementation based on NVIDIA End-to-End Learning paper using computer vision, deep learning, and realtime visual data from Udacity Self-Driving Car simulator.

More Projects & Solutions

  • Explore more projects & solutions here.

Need a custom Edge AI solution?