Accurate classification of rock sizes is a vital component in geotechnical engineering, mining, and
resource management, where precise estimation influences operational efficiency and safety. In this paper,
we propose an enhanced deep learning model based on the ConvNeXt architecture, augmented with both
self-attention and channel attention mechanisms. Building upon the foundation of ConvNext, our proposed
model, termed CNSCA, introduces self-attention to capture long-range spatial dependencies and channel
attention to emphasize informative feature channels. This hybrid design enables the model to effectively
capture both fine-grained local patterns and broader contextual relationships within rock imagery, leading
to improved classification accuracy and robustness. We evaluate our model on a rock size classification
dataset and compare it against three strong baseline. The results demonstrate that the incorporation of
attention mechanisms significantly enhances the model’s capability for fine-grained classification tasks
involving natural textures like rocks.