Joseph L. Hellerstein, T. S. Jayram, and Irina Rish, IBM Thomas J. Watson Research Center
Providing good quality of service (e.g., low response times) in distributed computer systems requires measuring end-user perceptions of performance. Unfortunately, in practice such measures are often expensive or impossible to obtain. Herein, we propose a machine learning approach to recognizing end-user transactions consisting of sequences of remote procedure calls (RPCs) received at a server. Two problems are addressed. The first is labeling previously segmented transaction instances with the correct transaction type. This is akin to work done in document classification. The second problem is segmenting RPC sequences into transaction instances. This is a more difficult problem, but it is similar to segmenting sounds into words as in speech understanding. Using Naive Bayes, we tackle the labeling problem with four combinations of feature vectors and probability distributions: RPC occurrences with the Bernoulli distribution, RPC counts with the multinomial distribution, RPC counts with the geometric distribution, and RPC counts with the shifted geometric distribution. Our approach to segmentation searches for sequences of RPCs that have a sufficiently high probability of being a known transaction type, as determined by one of our classifiers. For both problems, good accuracies are obtained, although the labeling problem achieves higher accuracies (85\%) than does segmentation (70\%).