| History |
|
|
|
1971Working with collated data is not entirely new. The first such effort to derive prediction equations for pulmonary function indices was probably that made by Polgar [1]. He and Promadhat compiled reference values (for the paediatric age range) from the literature since the 1920's, applying the following set of criteria (quotation):
The authors relied on published (a) pulmonary function means, standard deviations, and number of cases as a function of sex, height, age, weight or surface area; (b) equations relating pulmonary function to height and other variables; (c) combinations of these presentation forms. Basically, therefore, they faced the task of deriving a set of new equations for pulmonary function indices that was not based on raw data sets, but on summary statistics in the form of tables and equations. This was done by constructing a new data set by substituting values for discrete heights, and using these to derive a new prediction equation which was shown graphically to represent the mean of the original equations. The average percentage coefficient of variation was similarly estimated from published sets of measurements. This approach has been shown to be useful, and as of 2008 predicted values for spirometric indices based on Polgar's publication are still widely used worldwide. This procedure was also adopted by the European Community for Coal and Steel (ECCS) in the 1980's [2]. It is of historic interest to disclose how this decision came about. 1983 A working party had been given the task to revise a previous recommendation of the ECCS on measurements of lung volumes and ventilatory flows, lung elasticity, airway resistance and gas transfer. Whereas the 1971 ECCS recommendation [3] provided predicted values for some of these indices, there were various reasons why the working party wanted to provide a new set of equations. One reason was that the equipment used and the methodology had evolved. Another one was that the previous set of equations was based on a highly selected population: male workers in coal and steel industry. Not only were predicted values generally regarded as being quite high, but the recommended prediction equations for females were not based on a single measurement in females, but derived as an educated guess from data on males. Permission was not obtained to do a field study to derive new equations as it would be too costly. Also the working party was instructed not to publish on women. The reason for this was that the ECCS was financed by levies on coal and steel for the benefit of workers in that industry, and there were no female workers. The working party decided to ignore the instruction on reference values for females and tried to locate files with raw data so as to collate these, but this was to no avail. This was the reason to adopt the approach that Polgar had pioneered. An account of the procedures is given in [2]. Deriving predicted values from published prediction equations has several drawbacks, such as:
1995The above procedure was also used to derive predicted values for lung volumes for both adults and the paediatric age range [4]. It is obviously much more appropriate to derive prediction equations using 'raw data', obtained with acceptable measurement techniques and satisfactory quality control from a representative healthy reference population. The first such effort was published in 1995 [4]. It was based on the use of 6 data sets from 5 different countries in Europe and used to to derive a new set of regression equations, ascertaining that the new prediction equations were valid in all data sets, and assessing the difference between centres. This was published in Pediatric Pulmonology 1995; 19: 135-142. This gave rise to the following recommendation: "The approach adopted in this study is equally applicable to other ethnic groups, to other indices of ventilatory function, such as residual volume, functional residual capacity, total lung capacity, and transfer factor for the lung for CO. Similarly, problems need to be resolved about normal ventilatory function in other age ranges, in particular in elderly persons. There is ever increasing awareness that in the monitoring of patients or groups of special interest the assessment of longitudinal data needs to be further developed. There is potentially much to be gained by starting an international data base to this end, to which researchers who have performed studies which comply with international standards could submit their cross-sectional and longitudinal data. It will often obviate the need for costly and time-consuming new studies as so much information is already available but not exploited. We suggest that such a data base should be available for research purposes. The data base will also be of historic interest as it will allow future generations to study cohort effects. Proper selection of submitted material and management of the data base requires that an international body take responsibility. This would be a worthy project under the auspices of the European Respiratory Society and the American Thoracic Society. It would also be a worthy tribute to the approach pioneered by George Polgar." In reference 4 one of the conclusions was:
2008 The first publication to put the above into practice was by Stanojevic et al. [5]. It was based on 4 data sets in which the ages ranged between (1) 8-80 yr, N= 2273, (2) 4-19 yr, N = 761, (3) 5-18 yr, N = 316, and (4) 5-19 yr, N = 248. Pooling different data sets should not be done uncritically. We will go into requirements that need to be met elsewhere. It is for example possible that, even though the same model applies to different data sets, they differ systematically in the level of the predicted value. In that case the newly derived model will still be valid, but the coefficient of variation will be inflated. Whereas cubic splines are a very powerful tool in modelling the relationship between dependent and predictor variables, there is the danger that one will also model the idiosyncrasies of the combined data sets. Take e.g. FEV1/FVC, which is very dependent on age. If we have a number of data sets in which the FEV1/FVC ratios differ systematically, and the observations do not fully overlap in age and have widely different numbers of observations, a cubic spline with a sufficient number of degrees of freedom will follow a curved pattern to accommodate the different levels, where a straight one might be required. References
|
| Last Updated on Monday, 26 July 2010 11:51 |


