Article Text

The reliability of a new scoring system for knee osteoarthritis MRI and the validity of bone marrow lesion assessment: BLOKS (Boston–Leeds Osteoarthritis Knee Score)
  1. D J Hunter1,
  2. G H Lo2,
  3. D Gale3,
  4. A J Grainger4,
  5. A Guermazi1,
  6. P G Conaghan5
  1. 1
    Boston University Clinical Epidemiology Research and Training Unit, Boston University School of Medicine, Boston, Massachusetts, USA
  2. 2
    Tufts University, New England Medical Center, Boston, Massachusetts, USA
  3. 3
    VirtualScopics, Rochester, New York, USA
  4. 4
    Leeds Teaching Hospitals NHS Trust, Leeds, UK
  5. 5
    Academic Unit of Musculoskeletal Disease, University of Leeds, Leeds, UK
  1. Dr D Hunter, A203, Boston University School of Medicine, 715 Albany St., Boston, MA 02118; djhunter{at}bu.edu

Abstract

Aim: MRI provides unparalleled visualisation of all the anatomical structures involved in the osteoarthritis (OA) process. There is a need for reliable methods of quantifying abnormalities of these structures. The aim of this work was to assess the reliability of a novel MRI scoring system for evaluating OA of the knee and explore the validity of the bone marrow lesion (BML) scoring component of this new tool.

Methods: After review of the relevant literature, a collaborative group of rheumatologists and radiologists from centres in the UK and USA established preliminary anatomical divisions, items (necessarily broadly inclusive) and scaling for a novel semi-quantitative knee score. A series of iterative reliability exercises were performed to reduce the initial items, and the reliability of the resultant Boston–Leeds Osteoarthritis Knee Score (BLOKS) was examined. A further sample had both the BLOKS and WORMS (Whole Organ MRI Score) bone marrow lesion (BML) score performed to assess the construct validity (relation to knee pain) and longitudinal validity (prediction of cartilage loss) of each scoring method.

Results: The BLOKS scoring method assesses nine intra-articular regions and contains eight items, including features of bone marrow lesions, cartilage, osteophytes, synovitis, effusions and ligaments. The scaling for each feature ranges from 0–3. The inter-reader reliability for the final BLOKS items ranged from 0.51 for meniscal extrusion up to 0.79 for meniscal tear. The reliability for other key features was 0.72 for BML grade, 0.72 for cartilage morphology, and 0.62 for synovitis. Maximal BML size on the BLOKS scale had a positive linear relation with visual analogue scale (VAS) pain, however the WORMS scale did not. Baseline BML was associated with cartilage loss on both BLOKS and WORMS scale. This association was stronger for BLOKS than WORMS.

Conclusion: We have designed a novel scoring system for MRI OA knee, BLOKS, that demonstrates good reliability. Preliminary inspection of the validity of one of the components of this new tool supports the validity of the BLOKS BML scoring method over an existing instrument. Further iterative development will include validation for use in both clinical trials and epidemiological studies.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Osteoarthritis (OA) is the clinical and pathological outcome of a range of disorders that results in structural and functional failure of synovial joints.1 Whilst traditionally OA has been characterised by articular cartilage loss it is more accurately described as a multifactorial process characterised by changes in structure and function of the whole joint.2 Osteoarthritis of the knee is a major source of pain, disability, and health care utilisation in the elderly.3 Despite its growing prevalence, it is a condition with few effective therapies that modify the course of the disease. Therapeutic development has in part been constrained by the lack of valid and responsive structural endpoints for clinical trials.

The advent of MRI measurement in OA brought with it optimism that this limitation in endpoint measurement would be addressed. MRI has the capability of visualising all potentially relevant OA joint structures; therefore, it is not surprising that it has already proven to be an important tool in improving our understanding of knee OA by providing a tool for the study of healthy and diseased states, as well as in providing a means of assessing risk factors for pain in OA.46

However, the utility of knee MRI in the study of OA has been limited by the facts that current technology does not provide rapid quantitative assessment of multiple tissues; and, that there are limited semi-quantitative scores that have been systematically developed with respect to item content, scaling, reliability and feasibility. Recent analyses of such scores applied to patient datasets have highlighted issues including the following: non-unidimensionality of items (where more than one construct may be included in a given item, for example measuring features such as cartilage morphology breadth, depth and signal intensity in one single score); problems with the scaling of items, especially in “early” OA cohorts where only the lower end of scales may be used; and consequent concerns about responsiveness.7 8

In light of these limitations, we undertook a program to iteratively develop a novel comprehensive semi-quantitative scoring method specific for knee OA and to assess the reliability of this scoring scheme, entitled the Boston–Leeds Osteoarthritis Knee Score (BLOKS). As a first step in validating this new instrument we also explored the validity of assessment of bone marrow lesions (BMLs) using the BLOKS instrument. BMLs are an established feature of osteoarthritis and recently data has emerged that suggest lesions in the bone marrow are associated with the symptoms that emanate from knee osteoarthritis, and its structural progression.5 9 If the BLOKS method of BMLs assessment is valid and BMLs are associated with pain and structural progression then the strength of this association will reflect the validity of the measure.

MATERIALS AND METHODS

Development of BLOKS

A collaborative program between two international centres was established in 2004, incorporating rheumatologists and radiologists who were experienced in OA MRI research and outcome measurement. An initial meeting addressed the items and scoring to be included in BLOKS, based on the OA MRI literature. We selected many of the items based on likely relevance to pain and structural damage or progression of OA.10 At the conclusion of this review the co-authors delineated a novel scoring scheme that was likely to be descriptive for each morphological feature including cartilage integrity, attrition, bone marrow lesions and cysts, osteophytes, ligaments, meniscus and synovitis, in addition to other morphological features that may warrant attention in OA. Other items included in this preliminary BLOKS included meniscal displacement, collateral ligament contour, osteophyte signal, synovitis separate from effusion, subchondral plate signal and thickness, limb alignment and muscle quality. It was recognised that, in the absence of data concerning the importance of certain pathological features, the initial BLOKS should be broadly inclusive.

Assessment of reliability

After an initial training and calibration session a series of three reliability exercises ensued to improve reader calibration and to assess the reliability of identifying and scoring individual features of the instrument (listed in Table 1). In each exercise, two expert readers (AJG, DG) read the MRIs of 10 subjects followed by an adjudication session. The particular focus of the scoring development exercise was to refine the features so they were more OA-centric, to remove redundant items (some of the more complex scales in the prior analyses indicated that frequently many scores were used infrequently if at all), and to develop a more reader-friendly measurement tool.

Table 1 Features that are scored* on the Boston–Leeds Osteoarthritis Knee Score (BLOKS)

The analyses presented here are for inter-rater reliability (calculated using the weighted kappa (95% CI)) of the third scoring exercise after removing some of the items that were less reliable (kappa<0.2) in the two prior exercises (e.g. subchondral plate thickness and shape, central osteophytes, limb alignment, muscle quality, osteophyte signal). The readers took an average of 42 min to score each knee using BLOKS.

Study images for reliability exercise

The images for the reliability exercise were chosen at random from MRI scans undertaken within the Framingham Osteoarthritis Study.11 This is a group emanating from two different sources, the Framingham Offspring Study who are sons and daughters of the original Framingham Study cohort and their spouses and a new group recruited as a random sample of the community using random digit dialling. In 2002–2005, The Framingham Osteoarthritis Study recruited subjects without respect to a diagnosis of OA from the community. We obtained an MRI of one knee. All studies were performed with a 1.5T MRI system (Siemens, Mountain View, California, USA) using a phased array knee coil. A positioning device was used to ensure uniform placement of the knee among patients. T2-weighted fat-suppressed images in the sagittal and coronal planes were acquired, using the following pulse sequence parameters: time to recovery (TR) 3610 ms, time to echo (TE) 40 ms, slice thickness 3.5 mm, and field of view 14 cm. T1-weighted spin echo images in the sagittal plane were acquired, using the following pulse sequence parameters: TR 475 ms, TE 24 ms, slice thickness 3.5 mm, and field of view 14 cm. Three-dimensional fast low-angle shot (FLASH)-water excitation sequence (resolution 0.3×0.3×1.5 mm) were acquired in coronal and axial planes with TR 16.8 ms, TE 7.6 ms, and field of view 16.4 cm.

Validity of BML assessment

This question was assessed using data from The Boston Osteoarthritis of the Knee Study (BOKS). All subjects in this study had primary clinical knee osteoarthritis and met ACR criteria for this disorder.12 The source of recruited subjects and study design has been described in detail elsewhere.9 Of 324 subjects who entered the study, 193 men and 19 women received care from the Veterans Administration Health Care System and were recruited from the outpatient clinics there. A total of 8 men and 104 women were recruited from the community. In all, 86% completed a full comprehensive follow-up at a later time-point. The study included a baseline examination and follow-up examinations at 15 and 30 months.

At each visit, patients who did not have contraindications to MRIs had an MRI of the knee that was more symptomatic at baseline. At all examinations, patients had knee radiography and answered questionnaires about the severity of knee symptoms, including the Western Ontario McMaster Osteoarthritis (WOMAC) questionnaire. Patients were also weighed, with shoes off, on a balance-beam scale, and height was assessed. At the first follow-up visit, long-limb films were obtained with a 14×51 cassette, using methods described elsewhere.13 Mechanical alignment was measured as the angle formed by the intersection of the femoral and tibial mechanical axes. The femoral mechanical axis is the line from the femoral head through the centre of the knee and the tibial mechanical axis is drawn as a line from the centre of the ankle to the centre of the knee.

The institutional review boards of Boston University Medical Center and the Veterans Administration Boston Health Care System approved the baseline and follow-up examinations.

Magnetic resonance imaging for validity exercise

All studies in BOKS were performed with a Signa 1.5T MRI system (General Electric Corp., Milwaukee, Wisconsin, USA) using a phased-array knee coil. A positioning device was used to ensure uniformity among patients with the patient reclining in the supine position with a fully extended knee immobilised in the knee coil and the foot perpendicular to the table. The imaging protocol included sagittal spin-echo proton density- and T2-weighted images (repetition time (TR), 2200 ms; time to echo (TE) 20/80 ms) with a slice thickness of 3 mm, a 1-mm interslice gap, 1 excitation, a field of view (FOV) of 11–12 cm, and a matrix of 256×192 pixels; and coronal and axial spin-echo fat-suppressed proton density- and T2-weighted images (TR 2200 ms; TE 20/80) with a slice thickness of 3 mm, a 1-mm interslice gap, 1 excitation, and with the same FOV and matrix.

Whole Organ MRI Scoring (WORMS)

This is a widely used semi-quantitative scoring method for OA features as seen on MRI 14. Tibiofemoral (TF) cartilage on MRI was scored paired and unblinded to sequence on five plates (central and posterior femur; anterior, central and posterior tibia), for the medial and lateral TF compartment, using the WORMS semiquantitative method on fat-suppressed T2-weighted FSE images. Both cartilage signal and morphology were scored using a 0–6 scale: 0 = normal thickness and signal; 1 = normal thickness but increased signal on T2-weighted images; 2 = solitary focal defect of less than 1 cm in greatest width; 3 = areas of partial-thickness defects (<75% of the plate) with areas of preserved thickness; 4 = diffuse partial-thickness loss of cartilage (⩾75% of the plate); 5 = areas of full-thickness loss (<75% of the plate) with areas of partial thickness loss; 6 = diffuse full-thickness loss (⩾75% of the plate). Intraclass correlation coefficient (ICC) on agreement for cartilage readings ranged from 0.75–0.97 for intra and interobserver reliability.

In WORMS, grade 1 does not represent a change in shape but rather a change in signal in cartilage of otherwise normal shape. Grades 2 and 3 represent similar types of abnormality of the cartilage, focal defects without overall thinning. Therefore, to create a consistent and logical scale for evaluation of cartilage morphologic change and a fair comparison with radiographic changes in joint space narrowing, we collapsed the WORMS cartilage score to a 0–4 scale, where the original WORMS score of 0 and 1 were collapsed to 0, the original scores of 2 and 3 were collapsed to 1, and the original scores of 4, 5 and 6 were considered 2, 3 and 4, respectively, in the new scale. The score at all five plates in both the medial and lateral TF joint was summed to give a score with a possible range from 0–20. Cartilage loss was defined as a change in the summary score at subsequent follow-up. For measurement of cartilage loss films were read paired and unblinded to sequence using MRI sequence data from the sagittal and coronal planes.

BMLs in the subarticular marrow are defined as poorly marginated areas of increased signal intensity in the normally hypointense fatty marrow on the fat-suppressed spin-echo T2-weighted images, and graded in each region from 0 to 3 based on the extent of regional involvement; 0 = none; 1 less than 25% of the region; 2 =  25% to 50% of the region; 3 =  more than 50% of the region. The intra and inter-observer agreement (ICC) for reading BMLs ranged from 0.76–0.82, read by the same musculoskeletal radiologists. In this scoring system, BMLs were graded in the anterior, central, and posterior regions of the medial and lateral femur and tibia, and the subspinous region on the tibia blinded to sequence.14

BLOKS scoring

BMLs were also assessed on 74 subjects selected randomly from the larger BOKS sample using the BLOKS semi-quantitative scoring system. Each BML generates a grade for (i) size, (ii) the percentage of the surface area of the lesion that is adjacent to the subchondral plate, and (iii) the percentage of the lesion that is BML as distinct from cyst. The inter-rater reliability for reading BMLs is 0.72 (0.58–0.87) (weighted kappa).

Analysis: assessment of validity

Construct validity

Construct validity is present to the extent that the measurement is consistent with other measurements of the same phenomenon. We explored the construct validity of different measures of BMLs (from WORMS and BLOKS) and their relation to pain severity. We explored the relation of the predictor variable (baseline BMLs) to the outcome variable (visual analogue scale (VAS) pain). The BMLs were defined in three ways: (1) maximal BML in a knee (range 0–3): maximal of BML in all regions (nine regions for BLOKS scale, 15 regions for WORMS); (2) any BML in a knee (dichotomous): BML> = 1 in any region; and (3) large BML in a knee (dichotomous): BML> = 2 in any region. Definitions (2) and (3) were used in the prior publication investigating BML and pain.9

Analyses were adjusted to age, sex and body mass index (BMI)

Longitudinal validity

We compared the longitudinal validity of different ways of measuring BMLs (BLOKS vs WORMS) by comparing their respective associations with cartilage loss. If BML score has longitudinal validity15 we would expect it to predict cartilage loss on MRI.16

Among the 74 subjects who had BML reading on the BLOKS scale, we used 53 subjects who had longitudinal BML reading in both BLOKS and WORMS scale, and had alignment measures. One knee was used for each subject.

Baseline BML was defined as the summary of BML on both BLOKS and WORMS scale in the medial TF compartment and lateral TF compartment, respectively.

Using a similar definition as previously,16 BML change was defined as maximal BML change on both BLOKS and WORMS scale. Cartilage loss was defined as the summary of cartilage loss on the WORMS scale in the medial TF compartment and lateral TF compartment, respectively.

The model was used to assess the relation between cartilage loss and baseline BML, change of BML on both BLOKS and WORMS scale in medial compartment and lateral compartment, respectively. Then the relation between cartilage loss and baseline BML, change of BML on both BLOKS scale was assessed stratified by: (a) maximal BLOKS BML percentage surface area adjacent to the subchondral plate in the same compartment at baseline, 0–2 vs 3; (b) maximal BLOKS BML percentage of lesion as distinct from cyst in the same compartment at baseline, 0–2 vs 3.

These analyses were adjusted for confounding of age, gender, BMI and in another multivariate step for malalignment defined on a long limb film.16

RESULTS

Reliability exercise

Upon completion of the development of the third iteration of BLOKS we conducted an exercise to ascertain the interobserver reliability (DG and AJG) of the instrument on 10 subjects randomly chosen from the Framingham OA Cohort. Their mean age was 67 years (SD 9) with a mean BMI of 26.5 kg/m2 (SD 4.7). Of the 10 knees that were assessed their Kellgren and Lawrence (K&L) Grades were K&L = 0 in one knees, K&L  = 1 in three knees, K&L = 2 in two knees, K&L = 3 in three knees, and K&L = 4 in one knee. The reliability for the features described above are reported in Table 2.

Table 2 Interobserver reliability for reading of Boston–Leeds Osteoarthritis Knee Score (BLOKS) features (weighted kappa)

Validity of BML Assessment

Among the 74 subjects who had BML reading on the BLOKS scale, we used the 71 subjects who also had WORMS readings (see descriptive characteristics in Table 3). These 71 subjects were comparable to the larger study sample.

Table 3 Descriptive characteristics of 71 subjects

We first examined the correlation between BML on the BLOKS and WORMS scales. The range of baseline BML summary in medial TF compartment was 0–4 on the BLOKS scale, 0–8 on the WORMS scale, with Spearman correlation coefficient 0.63 (p<0.001).

The range of baseline BML summary in lateral TF compartment was 0–4 on the BLOKS scale, 0–8 on the WORMS scale, with Spearman correlation coefficient 0.79 (p<0.001).

The range of change of BML in medial TF compartment was from −2 to 2 on the BLOKS scale, from −5 to 4 on the WORMS scale, with Spearman correlation coefficient 0.11 (p = 0.28). The range of change of BML in lateral TF compartment was from −3 to 2 on the BLOKS scale, from −4 to 3 on the WORMS scale, with Spearman correlation coefficient 0.47 (p<0.001).

The relation of baseline maximal BML and VAS pain is presented in Table 4. This demonstrates increasing pain severity with increasing BML grade using the BLOKS grades (p for linear trend = 0.04). In contrast there was no significant association with VAS pain and the WORMS scale.

Table 4 Maximal bone marrow lesion (BML) and visual analogue scale (VAS) knee pain

Using the methodology from the prior publication9 there was a trend to increasing pain with large BMLs using the BLOKS scoring method and no significant relation was found between pain and the WORMS scale (Table 5).

Table 5 Any bone marrow lesion (BML), large BML and visual analogue scale (VAS) knee pain

Baseline BML and change of BML and their relation to cartilage loss on the BLOKS and WORMS scales

In the medial TF compartment, higher baseline BML summary score was related to more severe cartilage loss on both BLOKS and WORMS scale (Table 6). This association was stronger for BLOKS than WORMS and consistent with our prior work was diminished after adjusting for alignment.16 Change of BML summary score was not related to change of cartilage loss. When stratified by baseline BML area and by baseline BML percentage of lesion there was a strong association between baseline BML summary score and cartilage loss observed in the stratum with stratified variable less than 3 (for percentage of surface area adjacent to the plate), and a weaker association was observed in the stratum with stratified variable equal to 3. This suggests that lesions in the medial compartment with less contact with the subchondral plate have a greater effect on the rate of cartilage loss.

Table 6 Baseline BML and change of BML and their relation to cartilage loss on the BLOKS and WORMS scales in the medial TF compartment

In the lateral TF compartment (Table 7) a similar association was observed in the unstratified analyses as that seen in the medial TF compartment. Unlike the results in medial TF compartment in the stratified analyses, strong association between baseline BML summary score and cartilage loss was observed in the stratum with stratified variable equal to 3. No association between baseline BML summary score and cartilage loss was observed in the stratum with stratified variable less than 3.

Table 7 Baseline BML and change of BML and their relation to cartilage loss on the BLOKS and WORMS scales in the lateral TF compartment

DISCUSSION

This manuscript describes the development and reliability of a novel scoring scheme for OA studies utilising MRI of the knee. It is acknowledged that, at this stage of development, this system may serve multiple purposes (e.g. for outcome and risk factor assessments in clinical trials and for epidemiologic studies) and will ultimately have to be separately assessed and validated for each purpose. Just as magnetic resonance technology and our knowledge of the OA process are in a rapid state of development, we intend for this instrument to continually evolve. The BLOKS scoring instrument may well contain elements that will likely be core as well as exploratory features. Determination of their continued inclusion will be critically evaluated after further exercises.

The BLOKS method of assessing BMLs has validity through stronger association with pain severity than that found with WORMS. In addition, the ability of the BLOKS method to predict cartilage loss appeared stronger and provided additional information suggesting that the proximity of the lesion to the subchondral plate, and the amount of the lesion occupied by cyst influenced the rate of cartilage loss. Thus, in addition to demonstrating the reliability of this instrument we have also demonstrated improved validity of the BML score within BLOKS over an existing instrument. Further investigative work will be needed to establish the validity and responsiveness of the BLOKS instrument and to explore the reasons for proximity of the lesion to the subchondral plate and the amount of lesion that is BML vs cyst influencing the rate of cartilage loss.

The structural determinants of mechanical dysfunction and pain in arthritis are presently not well understood, but probably involve a multitude of interactive pathways characterised by changes in structure and function of the whole joint.2 The current practice of monitoring only a few of these features (usually radiographically-assessed joint-space narrowing and osteophytes) provides only a restricted view of the disease process and lessens the utility of such assessments.

Magnetic resonance can demonstrate soft-tissue structures and provide some insight into the tissue characteristics. For example, MRI studies of knee OA have been illuminating, revealing wide-ranging soft-tissue damage, hyaline cartilage defects, meniscal disruption, subchondral marrow changes and variability in appearance of cysts and osteophytes. They have shown that meniscal extrusion contributes to joint space loss in mild to moderate knee OA,17 and have uncovered pathologies such as bone marrow lesions and synovitis.18 19 This unparalleled imaging capability aligns well with a disease that affects the whole synovial joint organ and a need to capture detail on multiple different tissues in order to comprehensively evaluate the structural integrity of a joint. Because OA is a disease of all the tissues in the joints, measurements of structure need to be seen broadly and capture a broad number of important anatomic features, such as osteophytes, effusions, meniscal tears, subchondral bone architectural changes, in addition to cartilage loss. Most of these structures cannot be seen on plain radiography, whereas they can be clearly visualised on MRI.

Before large investments are made in post-processing analysis of epidemiological studies (such as the National Institutes of Health Osteoarthritis Initiative (NIH-OAI); a study in which 5000 subjects are having repeated longitudinal knee MRI assessments) and disease modifying OA drug clinical trials, researchers need to have available well-designed and validated tools. There are currently a number of semi-quantitative methods used to measure changes in joint morphology14 2022 though (as this field is just developing) there is little research published regarding the traditional metric features of these tools. Recent data analysis exercises have raised concerns about the scaling and sensitivity to change of one semi-quantitative score that prompted the development of BLOKS.7 8

We currently have additional studies underway on large datasets to ascertain the internal construct and content validity and the responsiveness of BLOKS. We are developing training tools and an instructional manual and atlas, and training tools to enhance further widespread use. A comprehensive atlas is needed so that the method can be applied by others. The increasing complexity of this instrument in an effort to separate constructs is a potential limitation of this development.

In conclusion, we have designed a novel expert-based scoring system for MRI OA knee scoring, BLOKS, that demonstrates reasonable reliability and validity for BML assessment. Further iterative development will include validation for use in both clinical trials and epidemiological studies.

Acknowledgments

We would like to acknowledge the support of Astra Zeneca who sponsored the travel and meetings, and in particular Rose Maciewicz, John Waterton, Meilien Ho and Tony Nash for their ongoing support of this process. We would like to thank the participants and staff of the Framingham OA Study and the BOKS Study.

REFERENCES

View Abstract

Supplementary materials

Footnotes