Deep learning has shown a great promise in classifying brain disorders due to its powerful ability in learning optimal features by nonlinear transformation. However, given the high-dimension property of neuroimaging data, how to jointly exploit complementary information from multimodal neuroimaging data in deep learning is difficult. In this paper, we propose a novel multilevel convolutional neural network (CNN) fusion method that can effectively combine different types of neuroimage-derived features. Importantly, we incorporate a sequential feature selection into the CNN model to increase the feature interpretability. To evaluate our method, we classified two symptom-related brain disorders using large-sample multi-site data from 335 schizophrenia (SZ) patients and 380 autism spectrum disorder (ASD) patients within a cross-validation procedure. Brain functional networks, functional network connectivity, and brain structural morphology were employed to provide possible features. As expected, our fusion method outperformed the CNN model using only single type of features, as our method yielded higher classification accuracy (with mean accuracy >85%) and was more reliable across multiple runs in differentiating the two groups. We found that the default mode, cognitive control, and subcortical regions contributed more in their distinction. Taken together, our method provides an effective means to fuse multimodal features for the diagnosis of different psychiatric and neurological disorders.