• 2019-10
  • 2020-07
  • 2020-08
  • br Demographic and socioeconomic data were captured from the


    Demographic and socioeconomic data were captured from the East Kent Patient Centre. Independent variables included cohabitation with next of kin, and distance from home to hos-pital (using postcodes). Postcodes were also used to derive the level of deprivation.1 Student’s t tests and correlation coef-ficients were used to test continuous and discreet data, and linear regression and decision tree analyses to build the mod-els. These were based on a training set (60% random selection
    of cases) and tested on the remainder (40%, the test set). They were then tested again on the validation set (the dataset from the fifth unit).
    Linear regression is a traditional means of modelling a continuous dependent variable. Decision tree analysis is a more modern computer-intensive method that is used in the context of live business decisions. It is a hierarchical mul-tivariate technique with a graphical structure that shows the importance of, and the inter-relations between, pertinent vari-ables. The outputs and structure can be updated to reflect new knowledge.2 The main challenge when building the tree is to decide which attribute to split to have the “best” data split at each step, and the concept of “information gain” informs this GW311616 decision. Information gain is the difference between the amount of uncertainty before and after a decision is made. The aim, which is to achieve a perfect classification with a minimal number of decisions, is not always possible because of noise or inconsistencies in the data. The variables listed in Table 1 were all potential inputs.
    The length of hospital stay, defined as the date of operation to the date of discharge or death, was the output.
    The initial modelling of the length of stay was done with a linear model and a regression tree. A stay of more than 50 days was rare, and accounted for less than 5% of the data. This was chosen to define outlier status, a SD of approximately 1 from the mean, and 24 patients who stayed for more than 50 days were removed (Site 1: 5/131; Site 2: 6/180; Site 3:
    Linear regression modelling of the independent factors showed age, intake of alcohol, T classification, performance status, tracheostomy, high-risk operation, and complexity of surgery, as independent predictors for an increased duration of stay. However, the model suggested a poor fit, as the resid-uals were related to duration of stay (heteroskedasticity), and
    Attempts to improve this by transforming the duration of
    stay with Poisson function were unhelpful (mse 158, multiple R2 not available).
    D. Tighe et al. / British Journal of Oral and Maxillofacial Surgery xxx (2019) xxx–xxx
    Variables by hospital.
    Mean (range) age (years)
    721 Performance status:
    728 Tracheostomy:
    722 Scale of operation:
    730 T classification:
    716 N classification:
    D. Tighe et al. / British Journal of Oral and Maxillofacial Surgery xxx (2019) xxx–xxx
    Table 1 (Continued)
    Previous radiotherapy:
    Table 2
    Frequency of extended hospital stay (more than 50 days) by hospital.
    Site No. of patients Maximum No. of patients No. of patients with a Mean (SD) 95% CI Median (95%CI)
    stay of over 50 days
    Review of Fig. 1 showed that discrimination of the model degraded after a stay of 10-15 days so a new approach was tested and we resisted the inclusion of complication data, as alleles would obscure our testing of the duration of stay as a proxy indicator for the quality of surgical care. We set a cut off of hospital stay at less than 15 days, and applied decision-tree methods to model for a short compared with a long stay. The data were again split into a train set (60%) and test set (40%), and the following attributes included: age, T size, per-formance status, tracheostomy indicator, high-risk indicator, scale of operation, and alcohol. A total of 607 had complete datasets for these variables; 350 had stays of less than 15 days, and 257 had stays of more than 15 days. The decision tree correctly assigned a short compared with a long stay in 79% (sensitivity = 0.8, specificity = 0.78, positive predictive value = 0.73, and negative predictive value = 0.84) (Fig. 2). The model performed well on the (external) validation set (sensitivity 0.8, and specificity 0.74).
    Using this model we graphically illustrated the propor-tion of the hospital’s case mix that would stay less than 15 days when the model predicted they would stay longer, the expected length of stay, and durations that were longer than expected (Fig. 3).
    The linear regression method was used on the subset of patients who were predicted to be in the short-stay group (as defined by the decision tree), split into a train set (60%) and test set (40%), and the model again identified age, alcohol, T classification, performance status, tracheostomy, high-risk status, and complexity of operation. We then re-tested the linear regression for patients with stays of less than 15 days. The performance of the model improved greatly (mse 4.99, adjusted R2 0.64) on the test set and was deemed reliable enough for the risk adjustment of data on the external valida-tion dataset. Fig. 4 shows the risk-adjusted length of stay for