We built and trained a hand segmentation model in PyTorch. The architecture of the model is a UNet-like network. One challenge of this task is that we need to segment not only hands but also arms with or without sleeves. There are only few existing datasets can satisfy our specific need. In order to provide enough training data, we construct our own dataset by capturing hands against green/white screen and combining the hand part with random background. The model is quite small, only 435KB, which can be efficiently deployed and run on mobile devices.
Figure 1. The procedure of dataset creation: we remove the white screen in the foreground to get the mask, and then combine the masked foreground and background to get the merged images.