Abstract
Bone union is the most commonly reported primary outcome in fracture treatment trials, yet no universally accepted radiographic definition exists. The widely taught criterion of “bridging callus on 3 of 4 cortices on anteroposterior and lateral radiographs” has no clearly identifiable primary source in the indexed literature. This narrative review traces the historical origins of radiographic bone union assessment, documents the heterogeneity of definitions used in clinical studies, and provides a comparative analysis of the standardized scoring systems developed to address this problem. A systematic PubMed search using six prespecified strategies, from database inception to March 2026, supplemented by hand-searching and citation tracking, identified 2,380 records. After screening, 359 articles on long-bone fractures were included. The “3 of 4 cortices” criterion appears most plausibly to derive from Panjabi’s 1985 finding that cortical continuity was the strongest radiographic predictor of fracture strength (r=0.80), but no traceable validation study was identified despite citation tracking through successive Cochrane reviews (CD008579, pub2‒pub4). In their 2008 study, Corrales and colleagues documented 11 different radiographic criteria across 123 studies, finding that ‘3 cortices’ was used in only 27%. Five standardized scoring systems (Radiographic Union Score for Tibial fractures [RUST], modified RUST [mRUST], Radiographic Union Score for Hip [RUSH], Radiographic Union Score for Humeral fractures [RUSHU], and Radiographic Humerus Union Measurement [RHUM]) have improved interobserver reliability within specific anatomical settings but remain fragmented by site and limited to secondary bone healing. A 2024 analysis by Bax and his team further illustrated that this inconsistency is not limited to fractures, documenting 13 different criteria and nine classification systems within the osteotomy literature. The most widely used radiographic union criterion likely emerged through clinical teaching rather than formal validation. A minimum reporting framework is proposed to improve standardization in future studies. Consensus definitions, cross-site validation, and more objective assessment strategies are needed to resolve this four-decade-old problem.
-
Keywords: Bone union, Fracture healing, Radiographic assessment, Radiography, RUST
Introduction
Every orthopedic surgeon knows when a fracture has healed. Or at least, every orthopedic surgeon believes they know. The determination that a fracture has achieved union—that the bone has regained sufficient structural integrity to bear physiological loads—is one of the most consequential clinical decisions in musculoskeletal medicine. It dictates when a patient may return to weight-bearing, when hardware may be removed, and when a fracture should be deemed a nonunion requiring further intervention. It is also the single most common primary or secondary outcome measure in fracture treatment trials.
Yet this foundational assessment rests on remarkably unstable ground. There is no universally accepted radiographic definition of bone union. The criterion most commonly taught—“bridging callus on 3 of 4 cortices visible on anteroposterior and lateral radiographs”—has no clearly identifiable primary source in the indexed literature. It appears in textbooks, Cochrane reviews, and institutional protocols worldwide, yet systematic citation tracking failed to locate a definitive original study. It appears to be an educational convention that emerged through clinical teaching rather than formal validation.
The consequences of this definitional vacuum are not merely academic. When Corrales et al. [
1] systematically reviewed 123 clinical studies in 2008, they found 11 different radiographic criteria for union, with “3 of 4 cortices” used in only 27%. A decade and a half later, Bax et al. [
2], reviewing osteotomy studies, identified 13 criteria and nine classification systems, with nearly half of reviewed studies failing to define union at all—confirming that the definitional problem extends beyond fracture healing. The problem has not been solved; it has proliferated.
In response, the period from 2010 to 2020 saw the development of five structured scoring systems—Radiographic Union Score for Tibial fractures (RUST), modified RUST (mRUST), Radiographic Union Score for Hip (RUSH), Radiographic Union Score for Humeral fractures (RUSHU), and Radiographic Humerus Union Measurement (RHUM)—each designed to standardize radiographic union assessment for a specific anatomic site. These tools represent genuine progress, yet their proliferation by anatomic site, combined with the persistent failure of clinical trials to adopt them, means that the fundamental problem remains.
The purpose of this narrative review is threefold: to trace the historical origins of radiographic bone union assessment and identify how the “3 of 4 cortices” teaching convention emerged without a published source; to document the evolution from ad hoc definitions to structured scoring systems; and to provide a comparative analysis of the RUST, mRUST, RUSH, RUSHU, and RHUM systems, evaluating their strengths, limitations, and the gaps that remain.
Methods
This narrative review was informed by a systematic literature search strategy. Six prespecified search strings (S1–S6) were executed in PubMed, covering the period from database inception through March 2026, targeting definitions and criteria for fracture union (S1), cortical bridging terminology (S2), radiographic scoring systems (S3), observer variability and reliability (S4), nonunion definitions (S5), and historical perspectives (S6). Full search strings are provided in
Supplementary S1. Three additional seminal publications predating electronic indexing [
3-
5] were identified through hand searching of reference lists, and six further anchor papers not captured by the database searches were identified through backward and forward citation tracking, yielding a total of nine additional records from non-database sources.
After automated deduplication, 2,380 records were screened by title (
Fig. 1). The scope was limited to long bone fractures of the appendicular skeleton (femur, tibia, humerus, radius, and ulna), including methodology and definition papers applicable across sites. Pre-specified exclusion criteria (detailed in
Supplementary S1) removed spine, pediatric, pelvic, tumor, drug intervention, biomechanical modeling, arthrodesis, periprosthetic, and small bone studies, among others. After screening, 359 articles were retained. Article selection for the narrative synthesis was guided by 12 anchor papers (
Table 1) identified a priori based on their seminal contribution to the development, validation, or critique of radiographic union assessment methods; of these, nine were identified through hand searching and citation tracking (as above), and three were captured by the database searches [
1-
12]. These were supplemented by 18 Tier 1 articles scored ≥12 points on a predefined relevance rubric.
This is a narrative review, not a systematic review. No formal data extraction or risk-of-bias assessment was performed. The heterogeneity of definitions and reporting styles across the included studies, combined with the conceptual and historical nature of the research question, supported a narrative rather than quantitative synthesis. The systematic search strategy was employed to ensure comprehensive identification of relevant literature and transparency regarding article selection. In addition, backward and forward citation tracking was performed to trace the origin of commonly used definitions, including tracking through successive editions of the Cochrane Collaboration’s systematic review (CD008579, pub2 through pub4) [
13-
15]. The search strategy, screening flow diagram, and relevance scoring rubric are provided in
Supplementary S1. A PRISMA-style flow diagram was used solely to enhance transparency of the search and screening process, not to imply a systematic review methodology; no duplicate records were identified across the six search strategies after automated deduplication. A limitation of this citation tracking is that non-indexed sources—including early Arbeitsgemeinschaft für Osteosynthesefragen/Association for the Study of Internal Fixation (AO/ASIF) manuals, preelectronic orthopedic textbooks, and institutional teaching handbooks—were not systematically reviewed, and therefore the possibility that the “3 of 4 cortices” criterion originated in such sources cannot be entirely excluded.
The origins of radiographic bone union assessment
The biomechanical foundation: White, Panjabi, and the four-stage model
The modern concept of radiographic union assessment can be traced to rabbit tibial osteotomy experiments conducted at Yale in the late 1970s. In 1977, White et al. [
5] published a biomechanical model dividing fracture healing into four sequential stages, each defined by the relationship between radiographic appearance and mechanical stiffness. Stage III was characterized by partial cortical bridging accompanied by a marked increase in torsional rigidity. By stage IV, complete cortical remodeling had restored near-normal mechanical properties.
For the first time, this model linked specific radiographic features—particularly the degree of cortical bridging—to quantifiable mechanical endpoints. The observation that stage III, defined by partial cortical bridging, already conferred substantial mechanical strength planted the conceptual seed for what would eventually become the “3 of 4 cortices” clinical criterion. Although the 1977 study never specified a numerical threshold for cortex count, it established the principle that cortical bridging, rather than callus volume or fracture line obliteration, was the radiographic feature most closely associated with structural integrity.
Cortical continuity as the best single predictor: Panjabi 1985
Eight years later, Panjabi et al. [
4] revisited this question with a more rigorous quantitative approach. They performed standardized tibial osteotomies in rabbits, allowed healing for 3 to 8 weeks, obtained orthogonal radiographs, and subjected each specimen to dynamic torsion testing. Radiographs were analyzed for five parameters: cortical continuity, callus thickness, callus diameter, fracture displacement, and callus area.
Cortical continuity emerged as the single best radiographic predictor of fracture strength (r=0.80). At the opposite end, callus area showed a correlation of only r=0.17—the weakest predictor. This finding contained a clinical paradox that remains relevant today: abundant callus formation, which is visually impressive and often intuitively reassuring, is in fact a poor indicator of mechanical strength. Conversely, the subtle finding of cortical continuity across the fracture site provides the most reliable evidence of structural healing.
If one accepts that two orthogonal radiographic views each display two cortices, and that cortical continuity is the dominant predictor of strength, then requiring bridging on “3 of 4 cortices” represents a clinical translation of Panjabi’s finding. However, these experiments were performed on rabbit tibiae with standardized transverse osteotomies, and neither White nor Panjabi ever stated a specific cortex count threshold. The extrapolation from r=0.80 to “3 out of 4 cortices” was a clinical simplification that occurred without formal validation.
The limits of radiographic assessment: Hammer 1985
In the same year, Hammer et al. [
3] published a sobering counterpoint. They evaluated 127 tibial fractures treated conservatively, having seven experienced surgeons independently assess radiographs for evidence of union. The overall accuracy was approximately 50%—essentially no better than a coin toss. Even more concerning, 55% of fractures that were mechanically unstable on subsequent clinical testing had been judged as “united.” The interobserver agreement was poor.
Hammer’s findings exposed a fundamental contradiction: the specialty had adopted plain radiography as its primary tool for determining when a fracture had healed, yet the reliability of this determination was no better than chance. This should have triggered immediate efforts toward standardization. Instead, it took another 25 years before the first validated scoring system was published.
The era of ad hoc definitions (1990s–2000s)
Corrales 2008: quantifying the problem
The extent of definitional inconsistency was first systematically documented by Corrales et al. [
1] in 2008. In a landmark systematic review of 123 clinical studies published in three major orthopedic journals between 1996 and 2006, they found 11 distinct radiographic criteria. The most commonly cited criterion was bridging by callus, bone, or trabeculae (53%), followed by bridging at three cortices (27%) and fracture line obliteration or cortical continuity (18%).
The quality reporting was equally troubling. Seventy-four percent of studies did not identify who had assessed the radiographs. Among the 26% that did, the assessors were orthopedic surgeons (19%), radiologists (2%), or both (4%). In three of the five studies that included both surgeons and radiologists, the two groups disagreed—with radiologists consistently reporting later times to union and lower union rates. Only two of the 123 studies reported any quantitative measure of interobserver reliability for their union assessment. This meant that the vast majority of fracture treatment conclusions were based on an outcome that had neither a standard definition nor a demonstrated reliability.
The “3 of 4 cortices” origin: a citation-free teaching convention
The Corrales data revealed an uncomfortable truth: the criterion most orthopedic surgeons consider “standard” was used in only 27% of studies and was not even the most common criterion. Critically, no study cited an original source for this definition [
1].
This observation prompted systematic citation tracking through the Cochrane Collaboration’s review of interventions for promoting fracture healing (CD008579). This Cochrane review, published in multiple versions from 2012 through 2023 [
13-
15], explicitly adopted the “3 of 4 cortices” criterion, stating: “For the purpose of this review we adopted the widely accepted definitions in the literature.” However, no specific primary reference was cited for this definition. The phrase “widely accepted definitions in the literature” appeared identically in all versions from pub2 (2012) to pub4 (2023), and the reference list —while including the 2010 study by Kooistra et al. [
6] for the RUST scoring system—did not contain a primary source establishing the “3 of 4 cortices” rule itself. Even the Cochrane Collaboration, regarded as the gold standard of evidence synthesis, adopted this criterion without being able to cite its origin.
This finding constitutes a central argument of the present review. The most widely recognized criterion for radiographic bone union—adopted by Cochrane and taught worldwide—has no clearly identifiable primary source in the indexed literature. It appears to be an educational convention: widely practiced, though no dedicated validation study has been identified. The most plausible genealogy is that clinicians translated Panjabi’s emphasis on cortical continuity (r=0.80) into a practical threshold based on anteroposterior (AP) and lateral radiographs. This translation occurred through clinical teaching—residency programs, conferences, and institutional protocols—rather than through a published validation study. By the time Corrales et al. [
1] surveyed the literature in 2008, the “3 of 4 cortices” criterion was already established practice, yet none of those studies cited its origin because no definitive indexed primary source appears to exist.
The clinical reliability problem
Whelan et al. [
7] had quantified the reliability problem in 2002. In a study of 30 tibial shaft fractures, four trauma surgeons assessed radiographs using multiple methods. The number of cortices bridged by callus achieved the highest agreement (κ=0.75), followed by visible fracture line (κ=0.70) and the surgeon’s general impression (κ=0.67). Even the best method produced only “substantial” agreement, meaning experienced trauma surgeons would disagree in approximately one of every four assessments. This established the clinical rationale for formal scoring systems.
The rise of radiographic scoring systems
RUST and mRUST
The RUST emerged directly from the 2002 reliability findings by Whelan et al. [
8]. Each of four cortices visible on AP and lateral radiographs receives a score of 1 to 3 (1, fracture line, no callus; 2, visible fracture line with callus; 3, bridging callus, no fracture line), yielding a total of 4 (definitely not healed) to 12 (definitely healed). The system was designed specifically for tibial shaft fractures treated with intramedullary nailing, where the nail obscures medullary detail and only cortical assessment is possible.
In the development study, seven orthopedic reviewers—including three traumatologists, two community surgeons, and two residents—independently scored 45 sets of radiographs representing various stages of healing. Overall interobserver agreement was substantial (intraclass correlation coefficient [ICC], 0.86; 95% confidence interval [CI], 0.79–0.91), with traumatologists achieving the highest reliability (ICC, 0.86) followed by community surgeons (0.83) and residents (0.81). Intraobserver reliability was also substantial (ICC, 0.88). These values compared favorably with all previously reported methods. However, the original study was limited by a small radiograph set (n=45) and single-institution reviewers, and no union threshold was established. External validation by Kooistra et al. [
6] confirmed reliability at an independent center (ICC, 0.84; 95% CI, 0.80–0.87).
The modified RUST (mRUST) expanded the scale to 4 points per cortex (adding a “remodeled” category), yielding a total of 4 to 16 [
9]. In 12 experienced reviewers scoring distal femur fractures, mRUST showed higher reliability than RUST overall (ICC, 0.68 vs. 0.63), with notably better performance for nail fixation (ICC, 0.74) than plate fixation (ICC, 0.59). Prospective data from two multicenter randomized controlled trials (RCTs) provided the first data-driven union thresholds: more than 90% of reviewers considered a RUST of ≥10 and a mRUST of ≥13 to indicate union. Nails demonstrated significantly higher scores at union than plates (RUST 9.0 vs. 8.2; mRUST 12.3 vs. 10.8; P<0.01), reflecting the different biology of secondary versus primary bone healing.
RUSH
The RUSH addressed hip fractures involving both cortical and cancellous bone [
10]. It incorporated cortical bridging (4–12), cortical fracture line disappearance (4–12), trabecular consolidation (1–3), and trabecular fracture line disappearance (1–3), for a total of 10 to 30. Six reviewers assessing 200 hip fracture cases achieved ICC (0.85) for femoral neck and 0.88 for intertrochanteric fractures—a dramatic improvement over the surgeon’s general impression (ICC, 0.60 and 0.50, respectively).
RUSHU and RHUM
The RUSHU extended structured assessment to humeral shaft fractures [
16]. In 60 patients (40 union, 20 nonunion), interobserver ICC was 0.79. A RUSHU cutoff of <8 at 6 weeks was predictive of eventual nonunion (area under the curve, 0.84; odds ratio [OR], 12.0; 95% CI, 3.4–42.9), making it unique in its explicit focus on prediction rather than description. Subsequent prospective validation confirmed the reliability and clinical utility of the RUSHU in an independent cohort [
17]. The RHUM was developed independently for nonoperatively managed humeral fractures [
18], identifying cutoffs of ≤6 for nonunion risk and ≥9 for union, but was limited by a small sample (n=36) and did not report ICC values.
Comparative analysis
The proliferation of scoring systems represents genuine progress (
Table 2). However, each was developed for a specific anatomic site, and none has been validated across sites. Union thresholds were established by different methodologies—consensus (RUST ≥10), reviewer percentage (mRUST ≥13), and receiver operating characteristic (ROC) analysis (RUSHU <8). The methodological limitations were systematically exposed by Ten Berg et al. [
11]: 68% of recent orthopedic RCTs failed to report the specialty of the observer assessing union, and the median number of observers per trial was just one. The most recent confirmation came from Bax et al. [
2], who, reviewing osteotomy studies, identified 13 criteria and nine classification systems, with 49.1% of studies failing to define union despite reporting it as an outcome—demonstrating that the problem persists across both fracture and osteotomy contexts.
Despite these differences, all five systems share a common conceptual architecture derived from Panjabi’s finding that cortical continuity best predicts mechanical strength. This shared design logic can be distilled into four principles: (1) independent evaluation of discrete cortical or trabecular zones, rather than a single gestalt impression; (2) ordinal staging of each zone along a healing continuum (from no callus through bridging to remodeling); (3) summation into a composite score that allows threshold-based binary classification (union vs. nonunion); and (4) anatomic adaptation of the zone map to the specific fracture site. The five systems are therefore not five independent inventions but five site-specific implementations of a single underlying paradigm. Recognizing this shared architecture clarifies the nature of the fragmentation problem: what the field lacks is not a new conceptual model, but a unified scoring framework that can be parametrically adjusted across anatomic sites while maintaining cross-study comparability. The lack of cross-site validation and the divergence in scoring structures (3-point vs. 4-point scales, cortical-only vs. cortical-plus-trabecular components) have prevented this unification. A researcher using RUST ≥10 for tibial union and another using mRUST ≥13 for femoral union may both report “90% union at 6 months,” yet the criteria are not interchangeable, and neither can be applied to a humeral fracture without adopting yet another system.
Practical considerations
For diaphyseal fractures treated with intramedullary nailing, the RUST or mRUST should be the primary assessment tool, with a mRUST ≥13 or RUST ≥10 indicating union. DiSilvio et al. [
12] demonstrated that any cortical bridging at 4 months postoperatively was the most reliable early predictor of eventual union (κ=0.91)—suggesting that the simplest criterion may be the most useful for early decision-making. For hip fractures, the RUSH should be used given the dual cortical-trabecular healing pathway. For humeral shaft fractures, the RUSHU provides the only validated tool with demonstrated predictive capacity (RUSHU <8 at 6 weeks: 12-fold increased nonunion risk).
For fractures treated with compression plating, all callus-based scoring systems must be interpreted with extreme caution. Consider a simple transverse forearm fracture treated with compression plating that shows progressive fracture line obliteration at 6 months but no periosteal callus: a RUST-based assessment would yield a score of 4—formally indicating “definitely not healed”—while the fracture may have achieved solid primary union. This limitation is fundamental, not technical: current scoring systems were designed for secondary bone healing and systematically fail when applied to primary bone healing, where the absence of callus reflects the biomechanical environment rather than healing failure.
Beyond plain radiographs
The fundamental limitation of plain radiography—compressing three-dimensional anatomy into two-dimensional images—has long been recognized. Computed tomography (CT) provides cross-sectional assessment of cortical bridging around the entire circumference of the bone, and several studies have shown that CT detects bridging earlier and more accurately than plain radiographs [
19]. However, its adoption as a routine union assessment tool has been limited by cost, radiation dose, metal artifact from implants, and—most critically—the absence of any standardized CT-based union scoring system. The RUST framework has not been formally adapted for CT, and no equivalent tool exists.
Ultrasound offers a non-ionizing, portable alternative capable of detecting callus earlier than radiographs [
20], but remains operator-dependent and lacks standardized criteria comparable to the RUST framework. Other modalities including dual-energy X-ray absorptiometry [
21], resonant frequency analysis [
22], and bioimpedance monitoring [
19] have shown promise in experimental settings but remain far from clinical implementation.
Perhaps most fundamentally, the conceptual framework underlying all cortex-based scoring systems—that union is defined by the presence of external callus—renders them blind to primary bone healing. When a fracture is treated with absolute stability, healing occurs through direct cortical reconstruction (Haversian remodeling) without visible callus. No modality-specific scoring system currently addresses this healing pattern, which represents not merely a technical gap but a paradigm limitation that must be addressed before any universal standard can be established.
Unresolved issues and future directions
An overarching issue underlying the definitional problem is that radiographic union is fundamentally a surrogate for clinical union—the restoration of pain-free, load-bearing function. The correlation between these two constructs has rarely been formally quantified, and clinical trials routinely treat radiographic endpoints as interchangeable with functional recovery despite incomplete evidence for this assumption. Any future consensus framework should therefore explicitly acknowledge this surrogate relationship and specify whether the intended endpoint is structural (radiographic evidence of cortical bridging), functional (return to weight-bearing or activity), or both.
Three specific areas require attention. First, a formal Delphi consensus process has never been specifically attempted for the question of radiographic bone union definition. Such a process—bringing together orthopedic traumatologists, radiologists, and clinical trialists—could establish minimum reporting standards, agree on a core set of radiographic parameters applicable across sites, and define thresholds for union and nonunion. The minimum reporting items proposed in
Table 3 could serve as a starting point for such a consensus process. The Corrales-to-Bax trajectory (2008–2024) shows that the problem will not resolve spontaneously; it requires deliberate coordination.
Second, machine learning algorithms have already demonstrated high accuracy in fracture detection on plain radiographs. An artificial intelligence (AI) system trained on large datasets of serial radiographs with known clinical outcomes could potentially provide objective, reproducible union assessment, eliminating the interobserver variability that has plagued the field since the 1985 study by Hammer et al. [
3]. However, AI-based assessment faces a circular problem: training an algorithm requires a ground truth label, and as this review demonstrates, there is no consensus on what that label should be. Training on RUST scores merely perpetuates the limitations of the score itself. The most promising approach may be to train AI on clinical outcomes (e.g., ability to bear weight without pain, absence of subsequent failure) rather than intermediate radiographic labels, thereby bypassing the definitional problem entirely.
Third, the problem extends beyond long bones. Pelvic ring fractures, periarticular fractures, and complex patterns such as segmental fractures or fractures with bone loss present assessment challenges that existing tools were not designed to handle. The next generation of union assessment tools will need to address this anatomic diversity.
Conclusions
Bone union is the most commonly reported outcome in fracture treatment research, yet the “3 of 4 cortices bridging” criterion—taught worldwide and adopted by the Cochrane Collaboration—has no clearly identifiable primary source in the indexed literature. It appears to be an educational convention born from the extrapolation of Panjabi's 1985 laboratory finding, transmitted through clinical teaching, and codified into practice without formal validation. The development of structured scoring systems (RUST, mRUST, RUSH, RUSHU, RHUM) between 2010 and 2020 improved interobserver reliability substantially, yet this progress remains fragmented by anatomic site, limited to secondary bone healing, and insufficiently adopted in clinical trials. To address these persistent gaps, we propose a set of minimum reporting items (
Table 3) as a practical first step toward standardization. Until the orthopedic community agrees on what "healed" looks like on a radiograph, we cannot truly compare the outcomes of fracture treatment. This educational convention deserves formal validation, and the field urgently needs consensus definitions applicable across anatomic sites and healing patterns.
Article Information
-
Author contributions
Conceptualization: JHK, SS. Methodology: JHK, SS. Investigation: JHK. Resources: JHK. Data curation: JHK. Supervision: SS. Project administration: SS. Visualization: JHK. Writing–original draft: JHK. Writing–review & editing: JHK, SS. All authors read and approved the final manuscript.
-
Conflicts of interest
No potential conflict of interest relevant to this article was reported.
-
Funding
None.
-
Acknowledgments
An AI-based tool (Claude, Anthropic; models used during manuscript preparation: Claude 3.5 Sonnet and Claude Opus 4) was used for language editing (grammar, clarity, and style). All content was reviewed and approved by the authors, who take full responsibility for the manuscript.
-
Data availability
Not applicable.
Supplementary materials
Fig. 1.Modified PRISMA flow diagram for the narrative review. Screening was performed by the principal investigator using title-based review. Article prioritization was guided by 12 anchor papers, including nine identified through hand-searching and citation tracking and three captured through database searches, together with a predefined relevance-scoring rubric (
Supplementary S1). PRISMA, Preferred Reporting Items for Systematic reviews and Meta-Analyses.
Table 1.Anchor papers: foundational references for radiographic bone union assessment
|
No. |
Study |
PMID |
Key contribution |
|
1 |
White et al. (1977) [5] |
845202 |
Four-stage biomechanical healing model |
|
2 |
Panjabi et al. (1985) [4] |
3998898 |
Cortical continuity r=0.80, best single predictor |
|
3 |
Hammer et al. (1985) [3] |
4042484 |
Radiographic accuracy ~50%, unreliability demonstrated |
|
4 |
Corrales et al. (2008) [1] |
18762645 |
11 Radiographic criteria, “3 cortices” only 27% |
|
5 |
Whelan et al. (2002) [7] |
11837825 |
κ=0.75 for cortical bridging, RUST motivation |
|
6 |
Whelan et al. (2010) [8] |
19996801 |
RUST development (ICC, 0.86) |
|
7 |
Kooistra et al. (2010) [6] |
20182243 |
RUST external validation (ICC, 0.84) |
|
8 |
Litrenta et al. (2015) [9] |
26165265 |
mRUST development, first union thresholds |
|
9 |
Bhandari et al. (2013) [10] |
23442540 |
RUSH development (ICC, 0.85–0.88) |
|
10 |
DiSilvio et al. (2018) [12] |
30882051 |
Any cortical bridging at 4 months predicts union |
|
11 |
Ten Berg et al. (2020) [11] |
31425411 |
68% of RCTs fail to report observer details |
|
12 |
Bax et al. (2024) [2] |
39534655 |
13 Criteria, 9 classifications, still no consensus |
Table 2.Comparison of radiographic scoring systems for bone union
|
System |
Target Site |
PMID |
Scale/cortex |
Total range |
ICC (inter) |
Union thresholda)
|
|
RUST |
Tibia (diaphyseal) |
19996801 |
1–3 |
4–12 |
0.86 |
≥10 (90% agreement) |
|
mRUST |
Tibia/femur (metadiaphyseal) |
26165265 |
1–4 |
4–16 |
0.68 (overall); 0.74 (nail) |
≥13 (90% agreement) |
|
RUSH |
Hip (FN, IT) |
23442540 |
Cortical 1–3×4+trabecular 1–3×2 |
10–30 |
0.85 (FN); 0.88 (IT) |
≥18 (suggested) |
|
RUSHU |
Humerus (shaft) |
31564159 |
1–3 |
4–12 |
0.79 |
<8 nonunion risk (AUC 0.84) |
|
RHUM |
Humerus (nonop) |
32034464 |
0–3 |
0–12 |
NR |
≤6 nonunion/≥9 union |
Table 3.Proposed minimum reporting items for fracture union assessment in clinical studies
|
No. |
Reporting item |
Description and rationale |
|
1 |
Explicit definition of union |
State the radiographic criterion used to define union (e.g., bridging callus on 3 of 4 cortices, RUST score threshold, fracture line obliteration). If no validated criterion exists for the fracture type studied, this should be acknowledged. |
|
2 |
Imaging modality and protocol |
Specify the imaging modality (plain radiograph, CT, ultrasound), views obtained (AP, lateral, oblique), and whether standardized positioning was used. |
|
3 |
Number, specialty, and experience of observers |
Report how many observers assessed union, their specialty (orthopedic surgeon, radiologist, trainee), and their experience level. Corrales et al. [1] in 2008 found 74% of studies omitted this information. |
|
4 |
Independence of assessment and consensus method |
State whether observers assessed radiographs independently or by consensus, and describe the method for resolving disagreements (e.g., majority rule, adjudication by senior author). |
|
5 |
Scoring system or assessment rule used |
Identify any validated scoring system (RUST, mRUST, RUSH, RUSHU, RHUM) employed, with citation. If a non-validated criterion was used, provide its explicit definition. Report interobserver reliability (κ or ICC) when more than one observer is involved. |
|
6 |
Timing of union assessment |
Specify when union was assessed (e.g., fixed time point, serial assessment until endpoint, or clinician-determined). If serial, state the interval and total follow-up duration. |
|
7 |
Relationship to clinical outcome |
Describe whether and how radiographic union was correlated with clinical endpoints (pain-free weight-bearing, return to activity, absence of hardware failure). Radiographic union is a surrogate; its relationship to functional recovery should be made explicit. |
References
- 1. Corrales LA, Morshed S, Bhandari M, Miclau T. Variability in the assessment of fracture-healing in orthopaedic trauma studies. J Bone Joint Surg Am 2008;90:1862-8.ArticlePubMedPMC
- 2. Bax EA, Harlianto NI, Custers RJ, van Egmond N, Foppen W, Kruyt MC. Radiographic assessment of bone union in proximal tibia and distal femur osteotomies: a systematic review. JBJS Open Access 2024;9:e24.00101.ArticlePubMedPMC
- 3. Hammer RR, Hammerby S, Lindholm B. Accuracy of radiologic assessment of tibial shaft fracture union in humans. Clin Orthop Relat Res 1985;199:233-8.Article
- 4. Panjabi MM, Walter SD, Karuda M, White AA, Lawson JP. Correlations of radiographic analysis of healing fractures with strength: a statistical analysis of experimental osteotomies. J Orthop Res 1985;3:212-8.ArticlePubMed
- 5. White AA, Panjabi MM, Southwick WO. The four biomechanical stages of fracture repair. J Bone Joint Surg Am 1977;59:188-92.ArticlePubMed
- 6. Kooistra BW, Dijkman BG, Busse JW, Sprague S, Schemitsch EH, Bhandari M. The radiographic union scale in tibial fractures: reliability and validity. J Orthop Trauma 2010;24 Suppl 1:S81-6.ArticlePubMed
- 7. Whelan DB, Bhandari M, McKee MD, et al. Interobserver and intraobserver variation in the assessment of the healing of tibial fractures after intramedullary fixation. J Bone Joint Surg Br 2002;84:15-8.ArticlePubMedPDF
- 8. Whelan DB, Bhandari M, Stephen D, et al. Development of the radiographic union score for tibial fractures for the assessment of tibial fracture healing after intramedullary fixation. J Trauma 2010;68:629-32.ArticlePubMed
- 9. Litrenta J, Tornetta P, Mehta S, et al. Determination of radiographic healing: an assessment of consistency using RUST and modified RUST in metadiaphyseal fractures. J Orthop Trauma 2015;29:516-20.ArticlePubMed
- 10. Bhandari M, Chiavaras MM, Parasu N, et al. Radiographic union score for hip substantially improves agreement between surgeons and radiologists. BMC Musculoskelet Disord 2013;14:70.ArticlePubMedPMCPDF
- 11. Ten Berg PW, Kraan RB, Jens S, Maas M. Interobserver reliability in imaging-based fracture union assessment: two systematic reviews. J Orthop Trauma 2020;34:e31-7.ArticlePubMed
- 12. DiSilvio F, Foyil S, Schiffman B, Bernstein M, Summers H, Lack WD. Long bone union accurately predicted by cortical bridging within 4 months. JBJS Open Access 2018;3:e0012.ArticlePubMedPMC
- 13. Griffin XL, Parsons N, Costa ML, Metcalfe D. Ultrasound and shockwave therapy for acute fractures in adults. Cochrane Database Syst Rev 2014;2014:CD008579.ArticlePMC
- 14. Griffin XL, Smith N, Parsons N, Costa ML. Ultrasound and shockwave therapy for acute fractures in adults. Cochrane Database Syst Rev 2012;(2):CD008579.Article
- 15. Searle HK, Lewis SR, Coyle C, Welch M, Griffin XL. Ultrasound and shockwave therapy for acute fractures in adults. Cochrane Database Syst Rev 2023;3:CD008579.ArticlePubMedPMC
- 16. Oliver WM, Smith TJ, Nicholson JA, et al. The Radiographic Union Score for HUmeral fractures (RUSHU) predicts humeral shaft nonunion. Bone Joint J 2019;101-B:1300-6.ArticlePubMedPDF
- 17. Fordyce W, Kennedy G, Allen JR, et al. Validation of the Radiographic Union Score for HUmeral fractures (RUSHU): a retrospective study in an independent centre. Shoulder Elbow 2023;15:390-7.ArticlePubMedPDF
- 18. Christiano AV, Pean CA, Leucht P, Konda SR, Egol KA. Scoring of radiographic cortical healing with the radiographic humerus union measurement predicts union in humeral shaft fractures. Eur J Orthop Surg Traumatol 2020;30:835-8.ArticlePubMedPDF
- 19. Atwan Y, Schemitsch EH. Radiographic evaluations: which are most effective to follow fracture healing. Injury 2020;51 Suppl 2:S18-22.ArticlePubMed
- 20. Moed BR, Watson JT, Goldschmidt P, van Holsbeeck M. Ultrasound for the early diagnosis of fracture healing after interlocking nailing of the tibia without reaming. Clin Orthop Relat Res 1995;310:137-44.Article
- 21. Eyres KS, Bell MJ, Kanis JA. Methods of assessing new bone formation during limb lengthening. Ultrasonography, dual energy X-ray absorptiometry and radiography compared. J Bone Joint Surg Br 1993;75:358-64.ArticlePubMedPDF
- 22. Sekiguchi T, Hirayama T. Assessment of fracture healing by vibration. Acta Orthop Scand 1979;50:391-8.ArticlePubMed