Thông tin tài liệu:
How to Display Data- P10:The best method to convey a message from a piece of research in health isvia a fi gure. The best advice that a statistician can give a researcher is to fi rstplot the data. Despite this, conventional statistics textbooks give only briefdetails on how to draw fi gures and display data.
Nội dung trích xuất từ tài liệu:
How to Display Data- P10 Displaying quantitative data 37 50 40Frequency 30 20 10 0 1.40 1.46 1.52 1.58 1.64 1.70 1.76 1.82 1.88 1.94 2.00(c) Height in metresFigure 4.6 (Continued.)obvious: there are more women than men; and the peak for men occurs at agreater height than for women (about 1.80 m compared to 1.62 m). The bins or intervals on the horizontal X-axis of the histogram can belabelled in a variety of ways. The bars may be labelled by using the mid-point of the corresponding interval, or by having a label at the start (or end)of the interval as in Figure 4.6. For histograms, we recommend that youlabel the horizontal axis, at the start (or end) of each interval, since with thismethod it is easier to work out the width of the interval (as in Figure 4.6).Some intermediate interval labels can be omitted, to avoid cluttering up thescale, without any noticeably loss of clarity as in Figure 4.6b. A useful feature of a histogram is that it is possible to assess the distribu-tional form of the data; in particular whether the data are approximatelyNormally distributed, or are skewed. The Normal distribution (sometimesknown as the Gaussian distribution) is one of the fundamental distribu-tions of statistics, and the histogram of Normally distributed data will havea classic ‘bell’ shape, with a peak in the middle and symmetrical tails, suchas that for height for women in Figure 4.7b. Skewed data are data which arenot symmetrical; positively skewed data have a peak at lower values and a38 How to Display Data 50 40 Frequency 30 20 10 0 1.41 1.47 1.53 1.59 1.65 1.71 1.77 1.83 1.89 1.95 (a) Height in metres 50 40 Frequency 30 20 10 0 1.41 1.47 1.53 1.59 1.65 1.71 1.77 1.83 1.89 1.95 (b) Height in metresFigure 4.7 Separate histograms for the heights of men and women:3 (a) for men(n 77) and (b) for women (n 145). Displaying quantitative data 39 200 150Frequency 100 50 0 0 50 100 150 200 250 300 350 Baseline ulcer area (cm2)Figure 4.8 Positively skewed data – histogram of baseline ulcer area (cm2) from legulcer trial (n 217).3long tail of higher values (Figure 4.8) while conversely negatively skeweddata have a long left-hand tail at lower values, with a peak at higher values(see Figure 4.9). Histograms are similar to bar charts in that the variable of interest is dis-played on the horizontal axis (X-axis) and the frequencies are displayed onthe vertical axis (Y-axis). However bar charts are used for discontinuousdata, where the categories are entirely separate while histograms are usedfor continuous data. Thus bar charts have gaps between the categories onthe horizontal axis in order to emphasise that the categories are completelyseparate, whereas there are no spaces in between the bins for a histogram, asthe width of these bins can be set by the investigator. The count data, for the number of deaths from SIDS per day, in Table4.1 could also be displayed as a histogram. This is because there are a largenumber of categories (14) of deaths per day and it is reasonable to treatsuch discrete count data as if they were continuous, at least as far as the sta-tistical analysis goes. However we would recommend count data should bedisplayed using bar charts as opposed to histograms, as the gaps betweenthe bars will emphasise that the categories represent discrete whole num-bers and cannot take intermediate values (e.g. it is not possible to have 1.3SIDS per day).40 How to Display Data 80 60Frequency 40 20 0 0 20 40 60 80 100 SF-36 Social functioning: baselineFigure 4.9 Negatively skewed data – histogram of baseline social functioning fromleg ulcer trial (n 233).34.6 Box–whisker plotsAnother extremely useful method of plotting continuous data is a box-and-whisker or box plot. This is described in detail in Figure 4.10. As with dotplots, box plots can be particularly useful for comparing the distribution ofthe data across several groups. The box contains the middle 50% of the data, with lowest 25% of thedata lying below it and the highest 25% of the data lying above it. In factthe upper and lower edges represent a particular quantity called the inter-quartile range. The horizontal line in the middle of the box represents themedian value as described in Section 4.4. The whiskers extend to the largestand smallest values excluding the outlying values. The outlying values aredefined as those values more than 1.5 box lengths from the upper or loweredges, and are represented as the dots outside the whiskers. Figure 4.10shows box plots of the heights of the men and women in the leg ulcer trial. Similar to dot plots, the gender differences in height are immediatelyobvious from this plot and this illustrates the main advantage of the boxplot over histograms when looking at multiple groups. Differences in the ...