Portfolio
Explore my cutting-edge computer vision solutions addressing real-world challenges in smart industries such as autonomous systems, healthcare, agriculture, and more.
Computer Vision Real-World Applications
Computer Vision Challenge 🏆
A hands-on collection of computer vision projects designed for all skill levels—beginners to experts.
Challenge Levels
- Level 0 - Beginner: Basics of image processing and OpenCV.
- Level 1 - Intermediate: Build deep learning models with PyTorch/TensorFlow.
- Level 2 - Hero: Explore image generation and inpainting with LVMs.
- Level 3 - Advanced (ongoing): Benchmark video models for action recognition.
- Level 4 - Expert (ongoing): Fine-tune VLMs and LVMs for custom datasets.
- Level 5 - Master (ongoing): Build multimodal AI systems combining vision, text, and more.
Perception
2D Object Detection with YOLO for Autonomous Vehicles
Apply the YOLO (You Only Look Once) model series to detect and classify objects such as pedestrians, vehicles, and traffic signs in real-time. It uses real-world camera and LIDAR data from Lyft 3D Object Detection dataset for autonomous vehicles.
Road Segmentation with Fully Convolutional Networks (FCN)
Implement an Fully Convolutional Networks (FCN) model to perform pixel-wise classification, enabling the vehicle to distinguish drivable road areas from obstacles. It gets real-world visual data from KITTI Dataset.
Realtime Multi-Object Tracking with DeepSORT
Integrate the DeepSORT model to track the trajectory of detected objects across video frames, maintaining consistent identification.
3D Object Detection with SFA3D and KITTI
Utilize the SFA3D model to detect objects in 3D space using LiDAR data from KITTI dataset, crucial for understanding the vehicle’s surroundings.
3D Data Visualization and Homogeneous Transformations
Visualize and manipulate 3D point cloud data from LiDAR sensors (KITTI dataset), applying homogeneous transformations to align data from multiple sensors.
Camera to Bird’s Eye View Projection with UNetXST
Develop a model to transform camera images into a bird’s eye view, aiding in better spatial understanding for navigation.
Multi-Task Learning with Multi-Task Attention Network (MTAN)
Implement a Multi-Task Attention Network (MTAN) on CityScapes Dataset, to simultaneously perform tasks like road segmentation and object detection, improving computational efficiency.
Multimodal Sensor Fusion with GPS, IMU, and LiDAR for Vehicule Localization.
This project involved integrating data from multiple sensors to accurately determine a vehicle’s position and motion on the roadway. The system uses techniques such as Kalman filtering to combine inputs from GPS, IMU, and LiDAR, enhancing the precision of state estimation critical for autonomous driving applications.
Additional Resources: Linear/Non Linear KF Implementation.
Depth Perception for Obstacle Detection on the Road
Implemented stereo depth estimation using Python and OpenCV on CARLA simulator images to calculate collision distances in a driving scenario.
Additional Resources: Learn core concepts here.
Visual Odometry (VO) for Self-Driving Car Location
This is a visual odometry system that estimates the vehicle’s trajectory using realtime visual data captured by its (monocular) camera.
Edge AI
Edge AI involves processing data locally on devices, reducing inference cost, offering faster decision-making, and enhanced security.
- Further Reading: The Next AI Frontier is at the Edge
- Learning Resources: Curated Edge AI Technical Guides
Real-time Segmentation Deployment with Qualcomm AI Hub
This case study demonstrates how to deploy a semantic segmentation model optimized for edge devices using Qualcomm AI Hub. The example leverages FFNet, a model tailored for efficient edge-based semantic segmentation, tested on the Cityscapes dataset.
Industry Applications: Autonomous Driving, Augmented Reality, and Mobile Robotics
⚡ SmartMeterSim: Edge-to-Cloud <=> Cloud-to-Edge Energy Monitoring
⚡ SmartMeterSim is a production-ready IoT solution for real-time energy monitoring and optimization for smart grids, buildings, and Edge-to-Cloud applications.
Case Studies
Self-Driving Car Environment Perception
Self-Driving Car foundational perception stack, which extracts useful information from its surroudings and perform complex tasks in order to drive safely through the world
End-to-End Self-Driving Car Behavioral Cloning
End-to-End self-driving car behavioral cloning implementation based on NVIDIA End-to-End Learning paper using computer vision, deep learning, and realtime visual data from Udacity Self-Driving Car simulator.
More Projects & Solutions
- Explore more projects & solutions here.
Need a custom Edge AI solution?