Machine learning models identify applications that run on a system

Fadi Mohsen

Photo: This is Fadi Mohsen, assistant professor in the Information Systems group at the Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence at the University of Groningen. He is the first author of a paper on the development of the first data-driven models that predict the stability of mobile applications.
Opinion more

Credit: University of Groningen

A large percentage of new apps are removed in the Google Play Store due to violating the store’s guidelines. This is inconvenient for users of these applications, who may lose their data within the application. Computer scientists from the University of Groningen have created two machine learning models that can predict the chances of a new app being removed, before and after it is uploaded to the App Store. These templates can help developers and users. Details of this project are described in a paper published in the journal Systems and soft computing On the 29th of September.

The Google Play Store has set rules and requirements that developers must adhere to. After being submitted, the apps are immediately uploaded to the Store, but it takes some time for Google to check them before removing the apps found to be in violation of the guidelines. Developers whose apps have been removed more than once may face a ban from the Store.


“My research interest lies in issues of digital privacy and security,” says Fadi Mohsen, associate professor in the Information Systems Group at the Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence at the University of Groningen. Given the consequences of app removal for both developers and users, he wanted to create a system that could predict whether or not new apps would be removed.

“There have already been attempts to do this, but they usually focus on specific types of apps that have been removed for specific reasons, for example because they contain malware,” Mohsen explains. “We wanted to develop a general model that predicts the chances of an app being removed, regardless of the type of app or the reason for removal.” Moreover, previous attempts focused only on users, while Mohsin also wants to help developers who just went wrong with the instructions by accident.

Code source

The first step was to collect a large set of data from removed and unremoved apps: “We collected metadata, including descriptions that developers provided to the Store, from nearly two million apps. Next, we downloaded the source code for half of these apps. Then, Mohsen and his colleagues tracked the status of these apps in the Store for six months to see which apps had been removed. “In our selection that was the case for 56 percent of them.” It took them 26 months to finalize the dataset that was used to build the machine learning models.


The algorithm they used is called Extreme Gradient Boosting. “It’s the best machine learning algorithm for these kinds of problems,” Mohsen explains. The algorithm was used to create two predictive models: one for developers and one for users. The model was identified to users by 47 features, and in the test data set, it predicted removal of a specific app with an accuracy of 79.2 percent. Since some of these features, such as ratings in the App Store, were not available before the app was submitted to the Store, the developer model relied on only 37 features, and its accuracy was slightly lower as a result: 76.2 percent.

“We can now predict the future of the app with reasonable accuracy,” Mohsen says. The next step is to develop an interface through which developers and users can evaluate applications in terms of removal risks. “This is important for developers, as they can be banned from the Google Play Store if they repeatedly violate the guidelines, but also for users, as they create data through their apps, which they will lose if the app does,” Mohsen says. Suddenly withdrew.

data set

Other researchers will also benefit from this research: “The rich data set we created for our research paper has been released to the public. Available from the Dutch repository Dataverse.nlThis means that anyone can try to improve the results obtained by Mohsen and his colleagues. We are looking forward to the competition to see if they can beat us. This will increase the interest for users and developers.

Reference: Fadi Mohsen, Damka Karstoyanova and George Azoupardi: Early detection of mobile app abuse: a data-driven predictive model approach. Systems and Soft Computing, September 29, 2022.

Disclaimer: AAAS and EurekAlert! is not responsible for the accuracy of newsletters sent on EurekAlert! Through the contributing institutions or for the use of any information through the EurekAlert system.

Leave a Comment