Computer Vision Engineer

Build systems that enable machines to see and understand the visual world â€” detecting objects, recognising faces, reading documents, inspecting manufactured parts, and enabling autonomous vehicles, medical imaging AI, and industrial robots to act on what they see.

Highly CompetitiveHigh demand Global career EntrepreneurialCan work remotely

Build My Roadmap Compare with Another Career Find Tutors for Key Subjects

A Computer Vision Engineer builds AI-powered systems that extract meaning from images and video â€” enabling machines to detect, classify, segment, track, and interpret visual content with or beyond human accuracy. Computer vision (CV) is one of the most mature and commercially deployed branches of applied AI; it powers quality control inspection in manufacturing, medical image analysis in radiology, face recognition in security systems, number plate recognition in traffic systems, object detection in autonomous vehicles, crop disease detection in agriculture, and document digitisation across industries. The field is built on deep learning â€” primarily Convolutional Neural Networks (CNNs) and, increasingly, Vision Transformers (ViTs) â€” trained on large labelled image datasets. The dominant technical stack is Python with PyTorch or TensorFlow for model development, OpenCV for classical image processing, and deployment via ONNX, TensorRT, or TFLite for edge and cloud inference. Sri Lanka has significant computer vision demand across several sectors. The apparel manufacturing sector (MAS Holdings, Brandix, Hirdaramani) uses CV for fabric defect detection, garment measurement, and quality inspection. The agriculture sector is exploring CV for tea leaf quality grading, coconut disease detection, and fishery catch monitoring. The banking and financial sector uses CV for document digitisation (KYC document processing), signature verification, and cheque processing automation. The Department of Motor Traffic uses ANPR (Automatic Number Plate Recognition). The healthcare sector is beginning to explore medical imaging AI (chest X-ray analysis, fundus photography grading for diabetic retinopathy). Globally, computer vision engineers are among the most in-demand AI specialists â€” Gartner and LinkedIn both list CV as a top-5 AI skill in global job postings. Sri Lankan CV engineers at local companies earn LKR 180,000â€“350,000/month; those working remotely for international clients or employed abroad earn USD 90,000â€“160,000.

What a Computer Vision Engineer does daily

Image classification and object detection model development â€” training and fine-tuning deep learning models; image classification with ResNet, EfficientNet, ViT (Vision Transformer); object detection with YOLO (v8/v9/v10, the most widely deployed detection architecture), DETR, RT-DETR; instance and semantic segmentation with Mask R-CNN, SAM (Segment Anything Model); selecting the right architecture for the task's accuracy/speed/size trade-off
Dataset preparation and annotation â€” computer vision is heavily data-dependent; sourcing, cleaning, and annotating image datasets using tools like Label Studio, CVAT, Roboflow, or V7 Darwin; defining annotation schemas (bounding boxes, polygons, keypoints, semantic masks); calculating inter-annotator agreement; data augmentation pipelines (Albumentations, torchvision.transforms) to expand dataset diversity without additional annotation cost
Transfer learning and fine-tuning â€” adapting pre-trained CV models to specific domains and tasks; fine-tuning ImageNet-pretrained CNNs on domain-specific data (medical images, satellite imagery, manufacturing defects); understanding when to freeze layers vs train end-to-end; few-shot learning for scenarios with very limited labelled data; the standard production approach since training from scratch is rarely justified
Classical image processing â€” OpenCV for pre-processing, filtering, and feature extraction; Gaussian blur, morphological operations (erosion, dilation, opening, closing); edge detection (Canny, Sobel); contour detection; perspective transformation and image warping; colour space conversion (RGB, HSV, LAB, grayscale); classical methods remain important for preprocessing pipelines and for tasks where deep learning is over-engineered
Model deployment and inference optimisation â€” converting trained PyTorch/TensorFlow models to deployment formats (ONNX, TensorRT for NVIDIA GPU acceleration, TFLite for mobile/edge, CoreML for Apple devices, OpenVINO for Intel CPUs); inference server deployment (Triton Inference Server, TorchServe, FastAPI-based REST API); optimisation techniques (quantisation, pruning, knowledge distillation) to reduce model size and increase inference speed for edge and real-time applications
Video analysis and multi-object tracking â€” processing video streams frame-by-frame; temporal consistency challenges; multi-object tracking (DeepSORT, ByteTrack, BoT-SORT) for tracking multiple objects across frames; optical flow (RAFT, Farneback) for motion estimation; action recognition; anomaly detection in surveillance video; real-time processing pipeline design with OpenCV and GStreamer
Medical imaging (for healthcare CV) â€” DICOM format processing (pydicom); working with CT, MRI, X-ray, and fundus photography modalities; adapting deep learning models to 3D volumetric data (3D U-Net); class imbalance handling for rare pathology detection; FDA/regulatory compliance requirements for medical AI; privacy and de-identification of medical images (important for Sri Lankan hospital AI projects)
3D computer vision â€” stereo vision (depth from two cameras); LiDAR point cloud processing (Open3D, PCL); depth estimation from monocular images (MiDaS, DepthAnything); 3D object detection; NeRF (Neural Radiance Fields) for 3D scene reconstruction; photogrammetry and structure-from-motion; relevant for robotics, autonomous vehicles, and AR/VR depth sensing
MLOps for computer vision â€” managing CV model lifecycle in production; data versioning (DVC); experiment tracking (MLflow, Weights & Biases); model registry; continuous model evaluation on production data; monitoring for data drift (distribution shift in incoming images); retraining pipelines triggered by performance degradation; the production infrastructure that separates research-grade from production-grade CV systems
Industrial and edge CV deployment â€” deploying CV models on edge hardware (NVIDIA Jetson Nano/Orin, Coral Dev Board with Google Edge TPU, Intel Neural Compute Stick); camera hardware selection (industrial GigE cameras vs consumer USB webcams; rolling vs global shutter trade-offs; near-infrared for low-light inspection); integration with PLCs and factory control systems for real-time quality control

Why this matters: Computer vision automates visual inspection tasks that are currently performed by human workers â€” at higher speed, lower cost, and often with greater consistency and accuracy. In Sri Lanka's apparel manufacturing sector, a single production line can produce thousands of garments per hour; manual quality inspection is a bottleneck and a cost centre; automated fabric defect detection using CV can inspect every garment at line speed with sub-millimetre accuracy. In agriculture, CV-based crop disease detection from smartphone photographs allows extension officers to diagnose tea, coconut, and paddy diseases without sending samples to a laboratory â€” a process that currently takes weeks. In banking, CV-powered KYC document processing eliminates weeks of manual data entry from account opening processes. In healthcare, CV-assisted radiology can flag chest X-rays requiring urgent radiologist review, prioritising queues in under-resourced Sri Lankan hospitals. The commercial ROI of industrial computer vision is among the highest of any AI application â€” directly measurable in reduced scrap rates, lower labour costs, and faster processing times.

Step-by-Step Career Roadmap

What to do

Build Python foundations â€” CS50P (Harvard, free) or Automate the Boring Stuff with Python (automatetheboringstuff.com, free book); Python is the language of computer vision; strong Python is the non-negotiable foundation
Understand what computer vision is â€” watch "How Computer Vision Works" by 3Blue1Brown or search "How does face recognition work? CNN explained" on YouTube; the intuition that a CNN learns to detect edges â†’ curves â†’ shapes â†’ objects layer by layer is the most important conceptual model in CV
Mathematics: fractions, percentages, and basic algebra mastery â€” the foundation for statistics and linear algebra later; take every mathematics enrichment opportunity available
First image manipulation with Python â€” install Pillow (PIL) library; write a Python script that opens an image, converts it to greyscale, saves it; then applies a brightness change; then flips it horizontally; simple operations that build mental models of images as numerical arrays
Explore fascinating CV applications â€” Google Lens object recognition; DeepFake detection; autonomous vehicle camera feeds on YouTube; medical imaging AI demonstrations; grounding motivation in compelling applications sustains through difficult learning periods

Key subjects

MathematicsScienceICT / ComputingEnglish

Skills to build

Python: variables, loops, functions, file I/O, lists, dictionariesPillow (PIL): image open/save/convert; basic pixel manipulationConceptual: understanding images as 2D arrays of pixel values; RGB vs greyscaleMathematics: fractions, basic algebra, introduction to coordinates and graphsEnglish: reading technical documentation comfortably

Suggested activities

CS50P: Chapters 1â€“5 (free, Harvard)
Python: open an image with Pillow; print its width/height/mode; convert to greyscale; save
Python: write a script that scans all images in a folder and prints their dimensions
YouTube: "How CNNs work" â€” 3Blue1Brown or Computerphile; watch and summarise in own words
Explore: Google Lens; try Google Teachable Machine (teachablemachine.withgoogle.com, free) to train a simple image classifier in the browser â€” no code required
Mathematics enrichment: every available extra Mathematics class or book

Important notes

Computer vision is one of the most mathematically demanding AI specialisations â€” linear algebra (matrices, eigenvalues, SVD), calculus (chain rule, partial derivatives for backpropagation), and probability are used daily by senior CV engineers; the investment in strong school-level mathematics pays compound returns; do not treat Mathematics as a hurdle to clear â€” treat it as the language of your future profession

💡 Backup / alternative options

Data ScienceAI/ML EngineerSoftware EngineerRobotics Engineering

⚠️ Important: Career paths and admission requirements change. Always verify the latest university entrance criteria, professional body requirements, and A/L subject combinations with official sources before making final decisions.