### BibTeX

@MISC{Bayman15,

author = {Ph.D Emine Ozgur Bayman and M.D Franklin Dexter and Ph.D Michael M Todd},

title = {},

year = {2015}

}

### OpenURL

### Abstract

ABSTRACT Background: Periodic assessment of performance by anesthesiologists is required by The Joint Commission Ongoing Professional Performance Evaluation program. Methods: The metrics used in this study were the (1) measurement of blood pressure and (2) oxygen saturation (SpO 2 ) either before or less than 5 min after anesthesia induction. Noncompliance was defined as no measurement within this time interval. The authors assessed the frequency of noncompliance using information from 63,913 cases drawn from the anesthesia information management system. To adjust for differences in patient and procedural characteristics, 135 preoperative variables were analyzed with decision trees. The retained covariate for the blood pressure metric was patient's age and, for SpO 2 metric, was American Society of Anesthesiologist's physical status, whether the patient was coming from an intensive care unit, and whether induction occurred within 5 min of the start of the scheduled workday. A Bayesian hierarchical model, designed to identify anesthesiologists as "performance outliers," after adjustment for covariates, was developed and was compared with frequentist methods. Results: The global incidences of noncompliance (with frequentist 95% CI) were 5.35% (5.17 to 5.53%) for blood pressure and 1.22% (1.14 to 1.30%) for SpO 2 metrics. By using unadjusted rates and frequentist statistics, it was found that up to 43% of anesthesiologists would be deemed noncompliant for the blood pressure metric and 70% of anesthesiologists for the SpO 2 metric. By using Bayesian analyses with covariate adjustment, only 2.44% (1.28 to 3.60%) and 0.00% of the anesthesiologists would be deemed "noncompliant" for blood pressure and SpO 2 , respectively. Conclusion: Bayesian hierarchical multivariate methodology with covariate adjustment is better suited to faculty monitoring than the nonhierarchical frequentist approach. (Anesthesiology 2015; 123:101-15) What We Already Know about This Topic • Although periodic assessment of anesthesiologists is required by some regulatory agencies in the world, there are no broadly accepted quality or safety performance metrics in anesthesia What This Article Tells Us That Is New • Noncompliance with simple blood pressure and oxyhemoglobin saturation metrics in approximately 70,000 cases at the University of Iowa was present in up to 43 and 70% of anesthesiologists, respectively, by using frequentist statistics compared with 2.4 and 0%, respectively, by using a Bayesian approach This article is featured in "This Month in Anesthesiology," page 1A. Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are available in both the HTML and PDF versions of this article. Links to the digital files are provided in the HTML text of this article on the Journal's Web site (www.anesthesiology.org). This study was presented at the 2013 American Society of Anesthesiologists meeting in San Francisco, California, on October 13, 2013, as one of the best clinical abstracts. T HE Joint Commission, a United States-based hospital accreditation organization, requires that all licensed practitioners (e.g., anesthesiologists) undergo periodic Ongoing Professional Practice Evaluation (OPPE). Evaluations must be based, at least in part, on measures of clinical performance and identify providers by name. Results are reported to the hospital. There are two separate issues when comparing the performance of anesthesiologists. One is the determination of the mean risk-adjusted incidence of their noncompliance with a chosen metric. This would be appropriate for making a comparison between groups and hospitals. 1,2 The second is the determination of outlying individuals within the same department. The two issues happen at different hierarchical levels. The OPPE requirement addresses the second issue. Anesthesiologists work in different subspecialty areas and perform widely differing numbers of cases. A metric applicable to one subset of anesthesiologists within a department might be meaningless when applied to another. For example, Not only is finding a "one-size fits all" metric difficult, so is finding valid analytic methods that avoid the "spurious outlier" problem inherent in the face of anesthesiologist-byanesthesiologist, patient and procedural, variation. Using raw measures of noncompliance (e.g., percentage of a given clinician's anesthetics not meeting the chosen metric) in conjunction with typical frequentist statistics (e.g., chi-square test or funnel plots) may yield misleading comparisons. Ehrenfeld et al. We developed a Bayesian hierarchical model to identify "performance outliers" after adjustment for covariates. The inclusion of relevant covariates eliminates the need to exclude large numbers of cases (e.g., pediatrics). The Bayesian approach is more statistically powerful than the frequentist methods when the sample sizes for individual participants are heterogeneous and when some are small. The goal of this study was to compare the characteristics and results of different statistical methods used to detect those anesthesiologists who might be judged as "outliers," with and without efforts to take into account different patient and subspecialty characteristics. We evaluated the influence of risk adjustment for each metric on results and compared our Bayesian methods to Ehrenfeld's frequentistobserved percentage with no covariate adjustment. Materials and Methods The Human Subject Research Determination form submitted to the University of Iowa Institutional Review Board (Iowa City, Iowa) was determined that this retrospective quality assurance project concerns primarily clinical activities and does not meet the regulatory definition of human subjects research (see table 1, Supplemental Digital Content 1, http:// links.lww.com/ALN/B145, for the details of the structured query language logic to create the analyzed dataset). We began with information from 79,327 anesthetics extracted from the University of Iowa's Epic (Epic Systems, Inc., USA) anesthesia information management system (AIMS) (appendix, Anesthesia Medical System Time Stamps), starting shortly after Epic "Go-Live" (November 16, 2010) and extending through June 30, 2013. We then focused on general anesthetics that were initiated with one of the five different agents: propofol, etomidate, sevoflurane, desflurane, and rocuronium. Isoflurane was not included because for no case was it the first agent used (i.e., its use was always preceded by one of the above noted medications). Desflurane and rocuronium were included because the study included patients who were already intubated and sedated and transported from an intensive care unit (ICU) to the operating rooms (ORs), and these two agents were sometimes the first given in the OR. This resulted in a dataset of 68,220 cases (see table 1, Supplemental Digital Content 1, http://links.lww.com/ALN/B145, which lists the structured query language logic to obtain 68,220 cases). Preoperative and procedural information from these cases were used to develop covariates for subsequent analyses (see sections Blood Pressure Metric-Selection of Covariates and pulse oximetry measured oxygen saturation (SpO 2 ) Metric-Selection of Covariates). For performance assessments, five sequential, 6-month periods were used, beginning with January 2011 through June 2011 and ending with January 2013 through June 2013. The 6-month periods were used because this is an accepted interval for periodic OPPE assessments (assessments must be more frequent than once-per-year). Our Bayesian method works fully for all providers (i.e., for our local quality improvement, all providers are included). However, in our analyses for this article, in order for an anesthesiologist to be included in the performance assessment for a period, the anesthesiologist being assessed had to be working for the department during that entire 6-month period. We did this to provide scientific results in this article that are generalizable to other institutions. The number of anesthesiologists doing hardly any cases for a 6-month period would not apply elsewhere. The effect of this was to reduce the sample size for performance analyses to 63,913 general anesthetics. To assess and compare the performances of anesthesiologists, two metrics were chosen. The criteria in choosing the metrics were that each should (1) apply to all anesthesiologists performing general anesthetics in the department; (2) be measured objectively; and (3) be present in our Epic AIMS to permit ready extraction every 6 months. The metrics chosen were (1) the time of the first recorded arterial or noninvasive blood pressure (NIBP) in relation to the time of induction 3,5 and (2) the time of the first recorded SpO 2 in relation to induction (see section Definition of "Time of Induction" for more information). For an anesthesiologist to be deemed compliant, these values needed to be recorded either before or coincident with the first appearance of any of the five aforementioned medications (defined as the start of induction). Because, for more than half of the records, the first induction dose was propofol, and because the time of propofol administration was not recorded automatically Ehrenfeld et al. 3 also excluded 35% (46 of 128) of anesthesiologists because they performed fewer than 60 general anesthetics during the studied period. Fewer of our anesthesiologists would have been excluded (4 of 56, 7.1%), but because the Bayesian method is not influenced by the number of anesthetics performed, we included all anesthesiologists regardless of the number of general anesthetics performed during the studied 6-month period, as long as the anesthesiologist worked for the department during that entire 6-month period. This meets the OPPE mandate. Definition of "Time of Induction" For intravenous drugs, the time of induction was considered the earliest recorded time of administration after the anesthesia start time. For propofol, etomidate, and/or rocuronium, an induction dose exceeding chosen thresholds was used. The threshold for propofol was 0.125 mg/kg, chosen to be greater than the typical ICU sedation dose of 25 μg kg −1 min −1 × 5 min. As this was approximately 20% of a typical induction dose, the (approximate) corresponding criteria applied were 0.06 mg/kg for etomidate and 0.12 mg/kg for rocuronium. The total dose over 5 min was used when there was more than one dose or an infusion. An example is provided in the appendix (Examples of Calculating the Total Dose of Propofol). For the volatile anesthetics (sevoflurane or desflurane), the first time after the anesthesia start time when the endtidal percentage concentration automatically exceeded a threshold was recorded. Thresholds used were 0.2 times the minimum alveolar concentration, that is, 0.42% for sevoflurane and 1% for desflurane. The reason for using threshold values for end-tidal concentrations was that residual subhypnotic concentrations from previous anesthetics in the OR on the same day frequently "bled over" into a subsequent case, potentially resulting in an erroneous induction time ( appendix, Using Thresholds for End-tidal Concentrations). Blood Pressure Metric-Selection of Covariates We started with a dataset consisting of all the preoperative characteristics available in the AIMS for the 68,220 patients receiving general anesthesia during the November 2010 through June 2013 period (tables 1-3, Supplemental Digital Content 1, http://links.lww.com/ALN/B145, 135 variables). Classification tree analyses were then performed by using SAS Enterprise Miner 7.1 (SAS Institute, Inc., USA). Models were compared based on the mean squared error. The classification/decision tree analyses created a hierarchy of branches. 6 Each variable was divided into as many as three branches. Each variable was used only once in the decision tree. For these analyses, a single dataset including the data from the entire period was used instead of five datasets from each of the 6-month periods. The use of age in the model reduced the mean squared error more than the use of any of the other 134 variables (see tables 2 and 3, Supplemental Digital Content 1, http://links. lww.com/ALN/B145). Once age was included, adding none of the other 134 variables meaningfully reduced the mean squared error (reduction on the mean squared error for each other variable ≤0.25%). The three age categories selected automatically by SAS Enterprise Miner were (1) age less than 7 yr and 3 months; (2) age between 7 yr and 3 months and 12 yr and 9 months; and (3) age 12 yr and 9 months or older ( The Bayesian method uses logistic regression models. Knowing the model variable from the SAS Enterprise Bayman et al. Bayesian Anesthesiologist Performance Miner, Box-Cox transformation was used to determine the best transformation for age as a continuous variable to satisfy the assumption of a linear relation between transformed age and the logit. The transformation used was two times the square root of age. The mean squared error for the blood pressure metric with age as the covariate was 0.0468. To evaluate, further, whether any of the other 134 variables individually should also be used for covariate adjustment, logistic regression models were fit, and the increases in the area under the curve versus from age alone were calculated. It was verified that the model with age was as good as any other model with any additional variable. The increases in the area under the curve, after the inclusion of other variables in addition to age, were all less than 0.6% absolute value. Since both mean squared error from the classification tree analyses and area under the curve statistics from the logistic regression models indicated that no candidate variable made a meaningful change in the model, patient's age (continuous variable with the square-root transformation) was the only characteristic that was used in the Bayesian model for absence of checking the blood pressure within 5 min of the induction of general anesthesia. Finally, Bayesian hierarchical generalized linear models were fit, adjusting for the patient's age within each of five 6-month periods. In the Bayesian model, patient's age, along with the random anesthesiologist effect, was included. Spo 2 Metric-Selection of Covariates Similar steps were followed for SpO 2 . Significant covariates detected by the classification tree analyses were the American Society of Anesthesiologists (ASA) physical status score (1 to 3 vs. 4 to 6), whether the patient was coming from the ICU, and whether the case was a first start of the workday, all binary variables ( There were 35 patients with ASA physical status 6 (brain dead), and they were included in the analyses. The "From ICU" variable reports if the preceding location before the patient was in an OR was an ICU. Locations considered ICUs were the hospital's cardiovascular ICU, emergency medicine department, medical ICU, neonatal ICU, pediatric ICU, and the surgical and neuroscience ICU. As displayed in the decision tree of figure 3, patients with ASA physical scores 4, 5, or 6 appear not to have had their SpO 2 checked before or within 5 min after induction more often than for patients with lesser ASA scores (5.40 vs. 1.09%, respectively). Similarly, the SpO 2 appears not to have been checked before induction more often for those patients coming from an ICU (7.77 vs. 4.43%). The sicker patients (ASA ≥4) coming from the ICU did so on transport monitors and sometimes the anesthesia provider did not enter the information into the electronic medical record (again, as described in section Blood Pressure Metric-Selection of Covariates). These patients were included in our study because these are precisely the patients for whom hypotension and/or hypoxemia may influence patient outcome. However, by including these variables in the risk adjustment, each anesthesiologist's behavior was compared with other anesthesiologists addressing the same systems-based issues. Frequentist Outlier Detection Methods The method described in the study by Ehrenfeld et al. 3 was used to identify outlier anesthesiologists. Ehrenfeld et al. 3 calculated a compliance threshold as an upper 95% CI of the overall incidence of noncompliance-and considered providers whose performance was beyond this CI as being "noncompliant." We calculated this frequentist threshold based on the data from a single dataset including the data from all 2.5 yr and then applied it to each of the five 6-month intervals. Bayesian Outlier Detection Methods The method described in the study by Chaloner and Brant 7 and Bayman et al. 8 was used to identify outliers. The method 8 was developed to detect outliers among centers in multicenter clinical trials. In the current study, they are applied to detect anesthesiologists with outlier behavior. Each anesthetic was attributed to the single anesthesiologist assigned in the electronic medical record at the time of the induction drug administration. The hierarchical model is especially appropriate here because the anesthetics are nested within the anesthesiologists, and the model takes into account patient and procedure characteristics Bayesian hierarchical generalized linear models were fit for each metric. Anesthesiologists were assumed to be performing similarly but not identical to one another: exchangeable. 9 In statistical modeling terms, this means that it was assumed that the anesthesiologists' performances were randomly sampled from the same normal distribution. Details of the model are given in the appendix (The Bayesian Model). In Bayesian analysis, unknown parameters are random variables and, therefore, prior probability distributions should be defined. The Bayesian model combines the prior distribution with data and produces a posterior distribution. Inferences are made from the posterior distribution. Two different prior probabilities were examined for an anesthesiologist having a significantly greater incidence of blood pressure (or SpO 2 ) noncompliance than the other anesthesiologists during the each of five 6-month periods. (1) The prior probability of each anesthesiologist having a significantly greater incidence of blood pressure (or SpO 2 ) noncompliance than the other anesthesiologists was set to 0.05 (appendix, Individual Prior Probability). (2) The prior probability of at Bayesian Anesthesiologist Performance least one anesthesiologist in the department during each studied 6-month period having a significantly greater incidence of blood pressure (or SpO 2 ) noncompliance than the other anesthesiologists was set to 0.05. Under the second setting, for 57 anesthesiologists in the department January 2013 through June 2013, the prior probability of each anesthesiologist being an outlier becomes 0.0009 (appendix, Overall Prior Probability). Prior distributions used for the overall mean and the coefficients for the fixed effects (i.e., patient age and whether the patient is coming from an ICU) were assumed to follow normal distributions as is typical for these types of analyses. The parameters of these prior distributions were set for them to be weakly informative (have very large SDs) 10 (for statistical details and explanations of the WinBUGS model, see the appendix, Adjusted WinBUGS Model for Blood Pressure Outcome -Individual Probability). Analyses were repeated using different prior distributions as sensitivity analyses. Random prior distributions were defined for each anesthesiologist. For each anesthesiologist, posterior probabilities of having a significantly greater incidence of blood pressure (or SpO 2 ) noncompliance (compared with the other anesthesiologists) were calculated, and the strength of evidence was quantified by the Bayes factor (BF). 9 The BF is the ratio of the posterior odds in favor of the null to the prior odds of the null. 9 Kass and Raftery 11 recommend a more conservative interpretation of BF. According to Kass and Raftery, BFs less than 0.33, 0.05, and 0.0067 are classified as "positive," "strong," and "very strong" evidence against the null hypothesis. 11 With both scales, BF greater than 1 provides the evidence for the null hypothesis. A BF less than 0.1 indicates "strong" evidence according to the Jeffrey scale 9 and was used as the criterion for an outlier in our study. The direction of the outlier (e.g., greater or lesser incidence of blood pressure [or SpO 2 ] noncompliance than the other anesthesiologists) was determined by the sign of the random effect term corresponding to the anesthesiologist. A negative δ k indicated that the k th anesthesiologist had a greater incidence of noncompliance relative to the other anesthesiologists. Overall standard errors (SEs) of the incidences of blood pressure (or SpO 2 ) first checked 5 min or more after induction were calculated treating each of the periods as point estimates. This was because the same anesthesiologists were tested among periods. 12 The numerator and denominator for each period were used in the Freeman-Tukey transformation. 12 Student t distribution was used to calculate the CI and P value of the transformed values. 12 By using the harmonic mean number of anesthesiologists per period, incidences were back-transformed to the percentage incidences. Basic data analyses were performed by using the SAS software 9.3 (SAS Institute, Inc.). Classification tree analyses were performed by using SAS Enterprise Miner software 7.1. Plots were created using SigmaPlot version 12.5 (Systat Software, USA). Bayesian analysis were performed by using the WinBUGS 1.4.3 software (Imperial College and Medical Research Council, United Kingdom). 14 WinBUGS uses Markov chain Monte Carlo methods. To represent the extreme regions of the parameter space, three parallel chains of equal lengths with disperse initial values were used in WinBUGS analyses. Convergence was judged by Brooks, Gelman, and Rubin diagnostics plots, 15 density and history plots, and autocorrelations. Bayesian results were based on 5,000 iterations after a burn-in period of 5,000 iterations in each chain. Reporting Of Bayes Used in clinical STudies (ROBUST) guidelines was followed to report Bayesian analyses in this study. Results Descriptive statistics are provided in table 1 for those variables used in one or more models for 68,220 general anesthetics. Descriptive statistics for those variables not used in Copyright © 2015, the American Society of Anesthesiologists, Inc. Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited. Blood Pressure Outcome The unadjusted overall incidence of noncompliance for all 63,913 cases from all five periods for the blood pressure metric was 5.35% (95% CI, 5.17 to 5.52%). Summary results, incidences of noncompliance, and the number of anesthesiologists for each of the five 6-month periods are given in table 2 for blood pressure metric. As summarized in the introduction, Ehrenfeld et al.'s 3 frequentist method uses the raw observed percentages for each anesthesiologist, without any covariate adjustment. Following their method, the upper 95% one-sided confidence limit (5.49%) was calculated for the overall (departmental) incidence of noncompliance. Based on this, 28.52% (18.68 to 38.36%) of the anesthesiologists in our department would be identified as noncompliant outliers (23 of 53, 14 of 56, 13 of 55, 16 of 59, and 14 of 57 in each of the five periods, respectively). Among the nine anesthesiologists each with at least 50% of their cases being pediatric (age <13 yr), all were frequentist outliers for at least four of five periods. Seven of these anesthesiologists were frequentist outliers for all five periods. Applying the Bayesian analyses, 4.24% (3.18 to 5.30%) of anesthesiologists were outliers without adjustment for patients' ages. For example, anesthesiologist 3 (the anesthesiologist with the third greatest number of cases during the whole 2.5 yr) was detected as having a significantly greater incidence of blood pressure noncompliance than the other anesthesiologists in all five periods (table 2). The random chance of detecting this anesthesiologist as an outlier in all five periods was miniscule (1.82 × 10 −9 ). The fact that the same anesthesiologist was detected during different periods suggests the reliability of the Bayesian method. Spo 2 Outcome The overall incidence of noncompliance for the SpO 2 metric was 1.22% (95% CI, 1.14 to 1.30%) (table 3). If an anesthesiologist had an incidence of SpO 2 measurement within 5 min of induction that was greater than the upper 95% CI of overall incidence of noncompliance (1.29%), the anesthesiologist was judged as "noncompliant." Using Ehrenfeld Sensitivity Analyses For the results presented up to this point, the prior probability of each anesthesiologist having a significantly greater incidence of blood pressure or SpO 2 noncompliance than the other anesthesiologists was set to 5.0%. A sensitivity analysis was performed, with the prior probability of at least one anesthesiologist in the department during the studied 6-month period having a significantly greater incidence of blood pressure (or SpO 2 ) first checked 5 min or more after induction than the other anesthesiologists (tables 4 and 5, Supplemental Digital Content 1, http://links.lww.com/ ALN/B145). For the blood pressure metric, fewer anesthesiologists were detected as outliers (pairwise differences 2.05% [−0.61 to 4.72%] for the unadjusted model and 2.10% [1.22 to 2.98%] for the adjusted model). For the SpO 2 metric, more were detected (0.11% [0.07 to 0.15%] for both unadjusted and adjusted models). Discussion Bayesian hierarchical outlier detection methods that take into account patient and practice characteristics provided more reliable and valid performance assessments for OPPE compared with those methods assessing raw incidence of compliance. Comparison with the Frequentist Approach The use of SAS Enterprise Miner enabled us to screen 135 potential covariates to learn what was important for our department. The methodology was an effective screening tool. However, the decision trees for our department are unlikely suitable for other departments. In other words, the "result" is the process, not the decision tree itself. We expected age to be a covariate for the blood pressure metric (e.g., sevoflurane induction in a child with ASA physical status 1 for myringotomy tube placement followed by placement of the blood pressure cuff). However, the fact that no other variable was an important covariate was a surprise. As we used decision trees, thresholds were not used for end-tidal concentrations (e.g., >1% desflurane). The results were implausible clinically, and, from this, we recognized that there were residual subhypnotic amounts of agents present from the preceding case. Because we were analyzing 135 variables, 68,220 records, and hundreds of minutes of records, identifying this uncommon effect would otherwise have been challenging. The two particular endpoints of our study used a threshold of 5 min. This choice is conservative, as it indicates that the patient did not have an SpO 2 checked until at least 5 min after recorded induction. This may also be a charting problem, but if so, it does reflect a responsibility of the supervising anesthesiologist. Ehrenfeld et al. Copyright © 2015, the American Society of Anesthesiologists, Inc. Wolters Kluwer Health, Inc. Unauthorized reproduction of this article is prohibited. Ehrenfeld et al., see Materials and Methods). Because OPPE by definition is to be applied to all anesthesiologists in a department, we need methods for risk adjustment that function without excluding classes of patients and/or anesthesiologists (see first section of Materials and Methods). We used a random effects model to represent heterogeneity among anesthesiologists. Because of the preassumption that anesthesiologists are performing similar to each other in the Bayesian hierarchical model, those anesthesiologists with much greater or lesser incidence of noncompliance compared with the other anesthesiologists were shrunk toward the overall mean. This implies that, if an anesthesiologist was detected as an outlier based on the Bayesian analysis, the anesthesiologist's performance should truly be outlying. Due to the shrinkage toward the overall mean, it is hard to detect an anesthesiologist with few cases as an outlier with the Bayesian method. However, with the Ehrenfeld et al.'s 3 approach, the anesthesiologists with few cases are not even analyzed. Ehrenfeld et al. 3 used frequentist outlier detection without covariate adjustment. We used the decision tree for the blood pressure outcome ( Mathematically, it may seem a substantial leap in complexity to go from Ehrenfeld et al.'s 3 simple method (based on observed incidence of noncompliance with no covariate adjustment and no multiple comparisons adjustment) to a Bayesian logistic regression model (with adjusted covariates chosen using data mining). However, there are no methods that mathematically are "in between," and what methods are available are not simpler to perform. In sequence, covariates need to be chosen with interactions while incorporating The unadjusted overall incidence of noncompliance for all 63,913 patients from all five periods for the blood pressure metric was 5.35% (95% CI, 5.17-5.52%) (3,417 of 63,913). Anesthesiologists were labeled according to their number of anesthetics during the whole 2.5-yr period. For example, anesthesiologist 1 performed the most number of anesthetics and anesthesiologist 2 is the second most in anesthetics. The adjusted model includes the patient's age. In this table, the prior probability of each anesthesiologist having a significantly greater incidence of blood pressure noncompliance than the other anesthesiologists was set to 0.05. Bayman et al. Bayesian Anesthesiologist Performance incidences of the covariates ( The logical alternative approach to use would be neural networks. They are (most assuredly) not simpler to use. Next, we used a risk-adjusted Bayesian model. A frequentist logistic regression model could be used, instead, but that would not be the logistic regression of basic statistical packages because such packages do not include the random anesthesiologist effect. Sensitivity Analyses The incidence of overall noncompliance for the SpO 2 metric (1.22 ± 0.04%) was much lower compared with the blood pressure metric (5.35 ± 0.09%). Because compliance rates for each anesthesiologist were high and closer to each other for the SpO 2 metric, the variances of the random anesthesiologist effects for the SpO 2 metric were less compared with the variance for the of the random anesthesiologist effects for the blood pressure metric for both adjusted and unadjusted model. As explained in the Materials and Methods section, for each anesthesiologist, setting the individual prior probability to 0.05 is a greater prior probability compared with setting the prior probability for overall departmental probabilities. Having a smaller variance for the SpO 2 metric in the data and having a smaller probability with the overall departmental prior distribution together led to having a smaller variance for the overall departmental prior distribution situation compared with the individual probability. Therefore, slightly more (0.11 [0.07 to 0.15]) individuals were detected as outliers for the SpO 2 metric when the overall departmental probabilities were used. The blue hexagons represent the anesthesiologists with a significantly greater incidence of SpO 2 first checked ≥5 min after induction than the other anesthesiologists based on the criteria of Ehrenfeld et al., 3 without covariate and multiple comparison adjustment. The green triangle shows the single anesthesiologist who was an outlier when the Bayesian method was applied without covariate adjustment. This anesthesiologist is also a frequentist outlier, which is why the symbol also includes some blue. None of the anesthesiologists was detected as having a significantly greater incidence of SpO 2 first checked ≥5 min after induction than the other anesthesiologists when the Bayesian method was applied with covariate adjustment. Unadj = unadjusted. Anesthesiologists identified as performance outliers Frequentist n = 37 n = 19 n = 25 n = 18 n = 13 Bayesian unadjusted (anesthesiologist identifier) Bayesian adjusted n = 0 n = 0 n = 0 n = 0 n = 0 The overall incidence of noncompliance for the SpO 2 metric was 1.22% (95% CI, 1.14-1.30%) (779 of 63,913). Anesthesiologists were labeled according to their number of anesthetics during the whole 2.5-yr period. For example, anesthesiologist 1 performed the most number of anesthetics, and anesthesiologist 2 is the second most anesthetics performed. The adjusted model includes covariates. ASA is 1 when the ASA physical status score is ≥4 and 0 otherwise. Start of the day is a binary variable indicating if the time from the start of the surgical day to induction was ≤5 vs. >5 min. The "From ICU" variable reports if the preceding location before the patient was in an OR was an ICU. For interpretation, see the legend of PERIOPERATIVE MEDICINE Limitations Our study has limitations. First, the outcomes we have chosen for the demonstration of our Bayesian methods may not reflect the actual performance levels of anesthesiologists. For example, an anesthesiologist may check blood pressure from a transport monitor without reporting it in the electronic medical records within 5 min after induction. This is a systemsbased problem. Although charting is part of medical care, it is not that the blood pressure was not being measured. Second, we analyzed the anesthesiologist, but anesthesia at our hospital is a team with anesthesiology resident(s) and/or Certified Registered Nurse Anesthetists. Third, we excluded those anesthesiologists (rotating fellows and locum workers) who did not work for the department for each 6-month period that was studied. Our methods could have included these groups of anesthesia providers. As explained in the "Comparison with the Frequentist Approach" section, due to the shrinkage toward the overall mean, the method will appropriately be unlikely to detect an anesthesiologist with a too small sample size as a Bayesian outlier. Therefore, including those anesthesiologists who worked for the department for less than 6 months would not change the results of our Bayesian analyses for our department but would have made the proportional effect of anesthesiologists as outliers artificially less. Another fundamental issue is the lack of any "definitive standard" for performance. OPPE only requires a comparison of providers within a department. Therefore, our approach compares providers with their peers-not versus some external standard. It is possible that the overall departmental incidences are inappropriate (e.g., the greater incidence of unreported SpO 2 before induction of anesthesia among patients from ICUs as shown in Bayesian Anesthesiologist Performance cannot claim that the "lower" incidence of outliers seen with our approach is "correct," whereas the unadjusted frequentist approach is "incorrect." Nevertheless, the suggestion (based on unadjusted frequentist methods) that nearly half of all of our anesthesiologists are "outliers" at some point in time seems unlikely. Neither our approach nor the frequentist method can detect "global" noncompliance problems (e.g., everyone within a department failing to measure blood pressure before induction). However, any such global standard (and compliance) would still need to be established via some kind of covariate adjustment process, similar (we believe) to that described in our study. Conclusions Given that the use of a Bayesian hierarchical multivariate methodology takes into account patient and practice characteristics, it is more representative of differences in case numbers and case mix of the anesthesiologists compared with a nonhierarchical frequentist approach. Therefore, Bayesian hierarchical methods may be a preferable method for mandated monitoring of the performance of anesthesiologists instead of those methods assessing the raw incidence of compliance. Acknowledgments The authors thank David Griffiths, B.S., and Gregory Hopson, B.A., M.I.S., both from the Department of Anesthesia, University of Iowa, Iowa City, Iowa, for their assistance with extracting the data from Epic anesthesia information management system for this study. Competing Interests The authors declare no competing interests. Correspondence Address correspondence to Dr. Bayman: University of Iowa Hospitals and Clinics, 6439 JCP, 200 Hawkins Drive, Iowa City, Iowa 52242. emine-bayman@uiowa.edu. Information on purchasing reprints may be found at www.anesthesiology.org or on the masthead page at the beginning of this issue. AnESTHESIOLOGy's articles are made freely accessible to all readers, for personal use only, 6 months from the cover date of the issue. Appendix: Details of the Informatics and the Bayesian Model Anesthesia Medical System Time Stamps Because there is no rounding on the time stamps of the data,