This study proposes an innovative urban fire risk evaluation framework based on street-view images to address escalating fire hazards in rapidly urbanizing megacities. Using Jiangning District, Nanjing, China, as a case study, this study analyzes historical fire incident data from 2022 to 2023. The primary innovation lies in developing a novel image-based scoring metric (SIE2.5KM), which extracts features that enable the assessment of vertical fire risk distributed in the three-dimensional built environment. This metric captures fine-grained, dynamic risk factors often overlooked in simple geospatial analysis. The study integrates SIE2.5KM with geospatial data, including population density, building characteristics, and infrastructure accessibility, to model fire risk across 1,775 500 m×500 m grid cells. Through rigorous threefold cross-validation, we compared the performance of four spatial econometric models and three machine learning models. The empirical results show that XGBoost is the optimal model, achieving superior predictive accuracy (R² always > 0.8, with lower RMSE/MAE). Key findings identify SIE2.5KM, distance to fire stations, and mixed land use as critical urban fire risk drivers, while population density exhibits a complex, nonlinear relationship, indicating that the impact of population on fires can be alleviated by diverse fire management measures. Residual analysis explores the optimization potential of this study’s framework in areas such as urban forest interfaces, such as by incorporating combustible vegetation information through more in-depth analysis.