The interaction with distant displays often demands complex, multi-modal inputs which need to be achieved with a very simple hardware solution so that users can perform rich inputs wherever they encounter a distant display. We present Simo, a novel approach, that transforms a regular smartphone into a highly-expressive user motion tracking device and controller for distant displays. Both the front and back cameras of the smartphone are used simultaneously to track the user's hand as well as the head, and body movements in real-world space and scale. In this work, we first define the possibilities for simultaneous face- and world-tracking using current off-the-shelf smartphones. Next, we present the implementation of a smartphone app enabling hand, head, and body motion tracking. Finally, we present a technical analysis outlining the possible tracking range.