Abstract: In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft user identification, the features of the data set are extracted from three aspects: basic attribute feature, statistical feature under different time scale and similarity feature under different time scale. Then we use feature sets of different combinations to carry out experiments under the KNN model, the random forest (RF) model and the XGBoost model. The experimental results show that the experimental results of the BF+SF+PF feature set in the three classifiers are obviously better than the other two feature sets. Therefore, it is concluded that different features have obvious effects on the recognition results.Abstract: In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft...Show More