Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.