中文

LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data

2021-09-06

2021-09-06

L Huang, Y Yin, Z Fu, S Zhang, H Deng, D Liu LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data.. Plos one 15 (4), e0230706

Abstract

Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.

Introduction

Health data from intensive care units can be used by medical practitioners to provide health care and by researchers to build machine learning models to improve clinical services and make health predictions. But such data is mostly stored distributively on mobile devices or in different hospitals because of its large volume and high privacy, implying that traditional learning approaches on centralized data may not be viable. Therefore, federated learning that avoids data collection and central storage becomes necessary and up to now significant progress has been made.

In 2005, Rehak et al. [1] established CORDRA, a framework that provided standards for an interoperable repository infrastructure where data repositories were clustered into community federations and their data were retrieved by a global federation using the metadata of each community federation. In 2011, Barcelos et al. [2] created an agent-based federated catalog of learning objects (AgCAT system) to facilitate assess of distributed educational resources. Although little machine learning was involved in these two models, their practice of distributed data management and retrieval served as a reference for the development of federated learning algorithms.

In 2012, Balcan et al. [3] implemented probably approximately correct (PAC) learning in a federated manner and reported the upper and lower bounds on the amount of communication required to obtain desirable learning outcomes. In 2013, Richtárik et al. [4] proposed a distributed coordinate descent method named HYbriD for solving loss minimization problems with big data. Their work provided the bounds of communication rounds needed for convergence and presented experimental results with the LASSO algorithm on 3TB data. In 2014, Fercoq et al. [5] designed an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions and demonstrated that their method was extendable to a LASSO optimization problem with 50 billion variables. In 2015, Konecny et al. [6] introduced a federated optimization algorithm suitable for training massively distributed, non-identically independently distributed (non-IID) and unbalanced datasets.

In 2016, McMahan et al. [7] developed the FederatedAveraging (FedAvg) algorithm that fitted a global model with the training data left locally on distributed devices (known as clients). The method started by initializing the weight of neural network model at a central server, then distributed the weight to clients for training local models, and stopped after a certain number of iterations (also known as global rounds). At one global round, data held on each client would be split into several batches according to the predefined batch size; each batch was passed as a whole to train the local model; and an epoch would be completed once every batch was used for learning. Typically, a client was trained for multiple epochs and sent the weight after local training to the sever, which would compute the average of weights from all clients and distribute it back to them. Experimental results showed that FedAvg performed satisfactorily on both IID and non-IID data and was robust to various datasets.

More recently, Konevcny et al. [8] modified the global model update of FedAvg in two ways, namely structured updates and sketched updates. The former meant that each client would send its weight in a pre-specified form of a low rank or sparse matrix, whereas the latter meant that the weight would be approximated or encoded in a compressed form before sending to the server. Either way aimed at reducing the uplink communication costs, and experiments indicated that the reduction can be two orders of magnitude. In addition, Bonawitz et al. [9] designed the Secure Aggregation protocol to protect the privacy of each client’s model gradient in federated learning, without sacrificing the communication efficiency. Later, Smith et al. [10] devised a systems-aware optimization method named MOCHA that considered simultaneously the issues of high communication cost, stragglers, and fault tolerance in multi-task learning. Zhao et al. [11] addressed the non-IID data challenges in federated learning and presented an improved version of FedAvg with a data-sharing strategy whereby the test accuracy could be enhanced significantly with only a small portion of globally shared data among clients. The strategy required the server to prepare a small holdout dataset G (sampled from IID distribution) and globally share a random portion α of G with all clients. The size of G was defined as . There existed two trade-offs: first, test accuracy and α; and second, test accuracy and β. A rule of thumb was that the larger α or βwas, the higher test accuracy would be achieved. It is worth mentioning that since G was a separate dataset from the clients’ data, sharing it would not be a privacy breach. Since no specific name was given to this method in Zhao et al.’s literature [11], we referred to it as “FedAvg with data-sharing” in our study. Bagdasaryan et al. [12] designed a novel model-poisoning technique that used model replacement to backdoor federated learning. Liu et al. used a federated transfer learning strategy to balance global and local learning [1316].

Most of the previously published federated learning methods focused on optimization of a single issue such as test accuracy, privacy, security or communication efficiency; yet none of them considered the computation load on the clients. This study took into account three issues in federation learning, namely, the local client-side computation complexity, the communication cost, and the test accuracy. We developed an algorithm named Loss-based Adaptive Boosting FederatedAveraging (LoAdaBoost FedAvg), where the local models with a high cross-entropy loss were further optimized before model averaging on the server. To evaluate the predictive performance of our method, we extracted the data of critical care patients’ drug usage and mortality from the Medical Information Mart for Intensive Care (MIMIC-III) database [17] and the eICU Collaborative Research Database [18]. The data were partitioned into IID and non-IID distributions. In the IID scenario LoAdaBoost FedAvg was compared with FedAvg by McMahan et al. [7], while in the non-IID scenario our method was complemented by the data-sharing concept before being compared with FedAvg with data-sharing by Zhao et al. [11]. Our primary contributions include the application of federated learning to health data and the development of the straightforward LoAdaBoost FedAvg algorithm that had better performance than the state-of-the-art FedAvg approach.