We propose a technique for automatically learning layers of "flexible sprites" -- probabilistic 2-dimensional appearance maps and masks of moving, occluding objects. The model explains each input image as a layered composition of exible sprites. A variational expectation maximization algorithm is used to learn a mixture of sprites from a video sequence. For each input image, probabilistic inference is used to infer the sprite class, translation, mask values and pixel intensities (including obstructed pixels) in each layer. Exact inference is intractable, but we show how a variational inference technique can be used to process 320 × 240 images at 1 frame/second. The only inputs to the learning algorithm are the video sequence, the number of layers and the number of flexible sprites. We give results on several tasks, including summarizing a video sequence with sprites, point-and-click video stabilization, and point-and-click object removal.