Purpose: Cone-beam CT (CBCT) of the extremities provides high spatial resolution, but its quantitative accuracy may be challenged by involuntary sub-mm patient motion that cannot be eliminated with simple means of external immobilization. We investigate a two-step iterative motion compensation based on a multi-component metric of image sharpness. Methods: Motion is considered with respect to locally rigid motion within a particular region of interest, and the method supports application to multiple locally rigid regions. Motion is estimated by maximizing a cost function with three components: a gradient metric encouraging image sharpness, an entropy term that favors high contrast and penalizes streaks, and a penalty term encouraging smooth motion. Motion compensation involved initial coarse estimation of gross motion followed by estimation of fine-scale displacements using high resolution reconstructions. The method was evaluated in simulations with synthetic motion (1-4 mm) applied to a wrist volume obtained on a CMOS-based CBCT testbench. Structural similarity index (SSIM) quantified the agreement between motion-compensated and static data. The algorithm was also tested on a motion contaminated patient scan from dedicated extremities CBCT. Results: Excellent correction was achieved for the investigated range of displacements, indicated by good visual agreement with the static data. 10-15% improvement in SSIM was attained for 2-4 mm motions. The compensation was robust against increasing motion (4% decrease in SSIM across the investigated range, compared to 14% with no compensation). Consistent performance was achieved across a range of noise levels. Significant mitigation of artifacts was shown in patient data. Conclusion: The results indicate feasibility of image-based motion correction in extremities CBCT without the need for a priori motion models, external trackers, or fiducials.