TY - GEN

T1 - How to use expert advice

AU - Cesa-Biancht, Nicoló

AU - Freund, Yoav

AU - Helmbol, David P.

AU - Haussled, David

AU - Schapire, Robert E.

AU - Warmuth, Manfred K.

N1 - Funding Information:
“This research was done while this author C!ruz partially supported by the “Progetto matlcl e calcolo parallelo” of CNR under grant Email address cesabian@lmlucca. csl uniml it ‘Email address: dph@cse. ucsc edu tHa”~~ler, Warm”th and Freund NOO014-91-J-1162 and NSF grant {haussler,manfred, yoav}@cse.ucsc.edu $Emall address schaPire@research
Publisher Copyright:
© 1993 ACM.

PY - 1993/6/1

Y1 - 1993/6/1

N2 - We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We give implications of this result on the performance of batch learning algorithms in a FAC setting which improve on the best results currently known in this context. We also extend our analysis to the case in which log loss is used instead of the expected number of mistakes.

AB - We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We give implications of this result on the performance of batch learning algorithms in a FAC setting which improve on the best results currently known in this context. We also extend our analysis to the case in which log loss is used instead of the expected number of mistakes.

UR - http://www.scopus.com/inward/record.url?scp=0027274446&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027274446&partnerID=8YFLogxK

U2 - 10.1145/167088.167198

DO - 10.1145/167088.167198

M3 - Conference contribution

AN - SCOPUS:0027274446

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 382

EP - 391

BT - Proceedings of the 25th Annual ACM Symposium on Theory of Computing, STOC 1993

PB - Association for Computing Machinery

T2 - 25th Annual ACM Symposium on Theory of Computing, STOC 1993

Y2 - 16 May 1993 through 18 May 1993

ER -