This document summarizes a proposed system for tracking and counting humans in a visual surveillance system. The system first performs background subtraction on input video frames to detect foreground objects. It applies this process to both grayscale and binary image formats, and selects the better-performing format. Features are then extracted from detected blobs, including size, centroid, and color. Tracking rectangles are drawn around whole detected objects to group separated body parts. Counting is done by detecting when pixel values change from background to foreground. Experimental results on videos with various challenges like shadows, illumination changes, and occlusions showed the system could accurately track and count humans, with near-perfect accuracy even for occluded or broken objects, though it introduced some minor errors.