Computer Vision Engineer
Build systems that enable machines to see and understand the visual world — detecting objects, recognising faces, reading documents, inspecting manufactured parts, and enabling autonomous vehicles, medical imaging AI, and industrial robots to act on what they see.
A Computer Vision Engineer builds AI-powered systems that extract meaning from images and video — enabling machines to detect, classify, segment, track, and interpret visual content with or beyond human accuracy. Computer vision (CV) is one of the most mature and commercially deployed branches of applied AI; it powers quality control inspection in manufacturing, medical image analysis in radiology, face recognition in security systems, number plate recognition in traffic systems, object detection in autonomous vehicles, crop disease detection in agriculture, and document digitisation across industries. The field is built on deep learning — primarily Convolutional Neural Networks (CNNs) and, increasingly, Vision Transformers (ViTs) — trained on large labelled image datasets. The dominant technical stack is Python with PyTorch or TensorFlow for model development, OpenCV for classical image processing, and deployment via ONNX, TensorRT, or TFLite for edge and cloud inference. Sri Lanka has significant computer vision demand across several sectors. The apparel manufacturing sector (MAS Holdings, Brandix, Hirdaramani) uses CV for fabric defect detection, garment measurement, and quality inspection. The agriculture sector is exploring CV for tea leaf quality grading, coconut disease detection, and fishery catch monitoring. The banking and financial sector uses CV for document digitisation (KYC document processing), signature verification, and cheque processing automation. The Department of Motor Traffic uses ANPR (Automatic Number Plate Recognition). The healthcare sector is beginning to explore medical imaging AI (chest X-ray analysis, fundus photography grading for diabetic retinopathy). Globally, computer vision engineers are among the most in-demand AI specialists — Gartner and LinkedIn both list CV as a top-5 AI skill in global job postings. Sri Lankan CV engineers at local companies earn LKR 180,000–350,000/month; those working remotely for international clients or employed abroad earn USD 90,000–160,000.
What a Computer Vision Engineer does daily
- Image classification and object detection model development — training and fine-tuning deep learning models; image classification with ResNet, EfficientNet, ViT (Vision Transformer); object detection with YOLO (v8/v9/v10, the most widely deployed detection architecture), DETR, RT-DETR; instance and semantic segmentation with Mask R-CNN, SAM (Segment Anything Model); selecting the right architecture for the task's accuracy/speed/size trade-off
- Dataset preparation and annotation — computer vision is heavily data-dependent; sourcing, cleaning, and annotating image datasets using tools like Label Studio, CVAT, Roboflow, or V7 Darwin; defining annotation schemas (bounding boxes, polygons, keypoints, semantic masks); calculating inter-annotator agreement; data augmentation pipelines (Albumentations, torchvision.transforms) to expand dataset diversity without additional annotation cost
- Transfer learning and fine-tuning — adapting pre-trained CV models to specific domains and tasks; fine-tuning ImageNet-pretrained CNNs on domain-specific data (medical images, satellite imagery, manufacturing defects); understanding when to freeze layers vs train end-to-end; few-shot learning for scenarios with very limited labelled data; the standard production approach since training from scratch is rarely justified
- Classical image processing — OpenCV for pre-processing, filtering, and feature extraction; Gaussian blur, morphological operations (erosion, dilation, opening, closing); edge detection (Canny, Sobel); contour detection; perspective transformation and image warping; colour space conversion (RGB, HSV, LAB, grayscale); classical methods remain important for preprocessing pipelines and for tasks where deep learning is over-engineered
- Model deployment and inference optimisation — converting trained PyTorch/TensorFlow models to deployment formats (ONNX, TensorRT for NVIDIA GPU acceleration, TFLite for mobile/edge, CoreML for Apple devices, OpenVINO for Intel CPUs); inference server deployment (Triton Inference Server, TorchServe, FastAPI-based REST API); optimisation techniques (quantisation, pruning, knowledge distillation) to reduce model size and increase inference speed for edge and real-time applications
- Video analysis and multi-object tracking — processing video streams frame-by-frame; temporal consistency challenges; multi-object tracking (DeepSORT, ByteTrack, BoT-SORT) for tracking multiple objects across frames; optical flow (RAFT, Farneback) for motion estimation; action recognition; anomaly detection in surveillance video; real-time processing pipeline design with OpenCV and GStreamer
- Medical imaging (for healthcare CV) — DICOM format processing (pydicom); working with CT, MRI, X-ray, and fundus photography modalities; adapting deep learning models to 3D volumetric data (3D U-Net); class imbalance handling for rare pathology detection; FDA/regulatory compliance requirements for medical AI; privacy and de-identification of medical images (important for Sri Lankan hospital AI projects)
- 3D computer vision — stereo vision (depth from two cameras); LiDAR point cloud processing (Open3D, PCL); depth estimation from monocular images (MiDaS, DepthAnything); 3D object detection; NeRF (Neural Radiance Fields) for 3D scene reconstruction; photogrammetry and structure-from-motion; relevant for robotics, autonomous vehicles, and AR/VR depth sensing
- MLOps for computer vision — managing CV model lifecycle in production; data versioning (DVC); experiment tracking (MLflow, Weights & Biases); model registry; continuous model evaluation on production data; monitoring for data drift (distribution shift in incoming images); retraining pipelines triggered by performance degradation; the production infrastructure that separates research-grade from production-grade CV systems
- Industrial and edge CV deployment — deploying CV models on edge hardware (NVIDIA Jetson Nano/Orin, Coral Dev Board with Google Edge TPU, Intel Neural Compute Stick); camera hardware selection (industrial GigE cameras vs consumer USB webcams; rolling vs global shutter trade-offs; near-infrared for low-light inspection); integration with PLCs and factory control systems for real-time quality control
Step-by-Step Career Roadmap
- Build Python foundations — CS50P (Harvard, free) or Automate the Boring Stuff with Python (automatetheboringstuff.com, free book); Python is the language of computer vision; strong Python is the non-negotiable foundation
- Understand what computer vision is — watch "How Computer Vision Works" by 3Blue1Brown or search "How does face recognition work? CNN explained" on YouTube; the intuition that a CNN learns to detect edges → curves → shapes → objects layer by layer is the most important conceptual model in CV
- Mathematics: fractions, percentages, and basic algebra mastery — the foundation for statistics and linear algebra later; take every mathematics enrichment opportunity available
- First image manipulation with Python — install Pillow (PIL) library; write a Python script that opens an image, converts it to greyscale, saves it; then applies a brightness change; then flips it horizontally; simple operations that build mental models of images as numerical arrays
- Explore fascinating CV applications — Google Lens object recognition; DeepFake detection; autonomous vehicle camera feeds on YouTube; medical imaging AI demonstrations; grounding motivation in compelling applications sustains through difficult learning periods
- CS50P: Chapters 1–5 (free, Harvard)
- Python: open an image with Pillow; print its width/height/mode; convert to greyscale; save
- Python: write a script that scans all images in a folder and prints their dimensions
- YouTube: "How CNNs work" — 3Blue1Brown or Computerphile; watch and summarise in own words
- Explore: Google Lens; try Google Teachable Machine (teachablemachine.withgoogle.com, free) to train a simple image classifier in the browser — no code required
- Mathematics enrichment: every available extra Mathematics class or book
- Computer vision is one of the most mathematically demanding AI specialisations — linear algebra (matrices, eigenvalues, SVD), calculus (chain rule, partial derivatives for backpropagation), and probability are used daily by senior CV engineers; the investment in strong school-level mathematics pays compound returns; do not treat Mathematics as a hurdle to clear — treat it as the language of your future profession
