Home 
Detection of dechallenge in spontaneous reporting systems: A comparison of Bayes methods Correspondence Address: Aim: Dechallenge is a response observed for the reduction or disappearance of adverse drug reactions (ADR) on withdrawal of a drug from a patient. Currently available algorithms to detect dechallenge have limitations. Hence, there is a need to compare available new methods. To detect dechallenge in Spontaneous Reporting Systems, datamining algorithms like Naive Bayes and Improved Naive Bayes were applied for comparing the performance of the algorithms in terms of accuracy and error. Analyzing the factors of dechallenge like outcome and disease category will help medical practitioners and pharmaceutical industries to determine the reasons for dechallenge in order to take essential steps toward drug safety. Materials and Methods: Adverse drug reactions of the year 2011 and 2012 were downloaded from the United States Food and Drug Administration«SQ»s database. Results: The outcome of classification algorithms showed that Improved Naive Bayes algorithm outperformed Naive Bayes with accuracy of 90.11% and error of 9.8% in detecting the dechallenge. Conclusion: Detecting dechallenge for unknown samples are essential for proper prescription. To overcome the issues exposed by Naive Bayes algorithm, Improved Naive Bayes algorithm can be used to detect dechallenge in terms of higher accuracy and minimal error.
Introduction Causality assessment (CA), is a method of evaluation used in pharmacovigilance to find out the relationship between drugs exposed and reported Adverse drug reactions (ADR). It includes, finding the temporal relationship between drugs and reported ADR, dechallenge, rechallenge, clinical and pathological characteristics of the events. [1] It is difficult for the practitioner with careful monitoring to identify the drugs causing ADR. In such a condition the withdrawal of drugs one at a time and evaluating the reaction of dechallenge has become essential. Hence a simple method of analysis for detecting dechallenge was considered in this study. Dechallenge is a response observed in a patient such as reduction or disappearance of ADR after withdrawal of a drug. There are two types of dechallenge namely Positive dechallenge which resolves with the withdrawal of drug and the Negative dechallenge which follows a course of its own. [2] Decision on the withdrawal of drug has been considered from the point of ADR underlying the disease. Rechallenge is essential to confirm the cause and relationship of ADR. [3] A typical procedure has to be followed before attempting rechallenge, with an understanding of risk involved for the patient. So the prescribers and the patients may not come forward for the procedures involved except on a few occasions. The prescribers too prefer to adopt dechallenge rather than rechallenge. Detecting adverse drug reactions and dechallenge have attained significance in personalized medicine. [4] Data mining is a kind of statistical approach for discovering useful patterns from enormous amount of data. [5] It contains algorithms to find out the pattern by means of various approaches like classification, prediction, clustering and association. Datamining algorithm called Multiitem Gamma Position Shrinker is now used for signaling potential ADRs by Food and Drug Administration (FDA) in US. [6] The Naive Bayes (NB) a statistical classifier predicts class membership probabilities widely used by researchers in data mining for classification. It is assumed that all the variables contributing toward classification are mutually independent. It leads to a simple prediction framework which gives good results in many cases. There is a possibility of obtaining zero probability. To overcome this, a standard technique like laplacian correction was used. [5] Various attempts have been made to resolve this issue. [7] Materials and Methods Medical Dictionary for Regulatory Activities (MedDRA), an international medical terminology developed under the auspices of the International Conference of Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) is a controlled medical vocabulary for describing adverse events with five levels: the coarsest is System Organ Class (SOC), followed by High Level Group Term (HLGT), Higher Level Term (HLT), Preferred Term (PT), and Lowest level Term (LLT), the finest grained description. [8] ADRs in FDA 2011 and 2012 were considered in the present study. Data were extracted from SRS provided by FDA. The duplicate reports were deleted in accordance to FDA's recommendation of adopting in recent case number as described in one of the files"Ascnts.doc' from the website of the FDA. [9] FDA had stored diseases category at Preferred Terms (PT) level. Among the five levels of adverse events hierarchy of MedDRA, SOC level was used to classify the diseases category by referring to cancer therapy evaluation program simplified disease classification v4.0 (MedDRA v 12.0). [10] Researchers suggested that it might be more advantageous to perform data mining, using a coarser grained adverse event representation SOC than PT level. The data were loaded from FDA's text file to oracle database using Extract, Transform and Load (ETL) tools. Indices were constructed using patient identifier. Records with SOC as gastrointestinal, renal and urinary, metabolism and nutrition disorders were considered for dechallenge classification. The attributes considered to evaluate dechallenge were the diseases categories denoted as System Organ Class (SOC) in MedDRA, drug with valid trade and verbatim name represented by code 1 and 2 by FDA, outcoming like LifeThreatening (LT), Death (DE), Congenital Anomaly (CA), HospitalizationInitial or Prolonged (HO), Disability (DS), Required Intervention (RI) to prevent permanent impairment had been considered for determining the occurrence of dechallenge. The process of determining suitable algorithm for detecting dechallenge was accomplished by comparing NB and NB + . The performance of data mining algorithm was estimated by the parameters like percentage of Accuracy, Error, Precision and Receiver Operating Characteristic (ROC) curve. All the parameters were depicted by 2 × 2 confusion matrixes, containing the total number of True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN), where Positive referred to identified set and Negative to rejected set. Accuracy, Precision and Error were calculated by Formula 1, 2 and 3 as follows: [INLINE:1] ROC curve was used to convey the graphical representation of Perfect, Liberal, Random and Conservative performance of an algorithm. Related Works Statistical data mining techniques had been implemented in the field of postmarketing surveillance. Yanqing and Hao et al., proposed an algorithm for casual association between two events. [11] Safety signal detection problems in pharmacovigilance were examined by Roy and Jeffrey et al., [12] Statistical data sources and data mining methods used in safety signal were studied by Atsuko and Manfred et al.,[13] Liang and Rongzhan et al., applied the concept of decision tree algorithm for classification in drug safety. [14] Corani and Zaffalon proposed an extension of NB named as Naive Credal Classifier to issue reliable classifications for a domain with high uncertain information. [15] Denis et al., proposed a classifier for eliminating noise in the dataset. [16] Chen and Shengrui proposed classification method for high dimensional data. [17] Kotsiantis and Pintelas modified NB classifier using bagging and boosting procedures. [18] Zhang et al., proposed a novel model called Hidden Naive Bayes to avoid computational complexity. [19] Data Mining Model For Data mining model, each patient was identified by unique number. The other data were outcome of drug and disease category. Drug classified by FDA as 1 for valid trade and 2 for verbatim name. The disease category was denoted in SOC level, by mapping the PT of FDA with MedDRA PT using Extensible Markup Language (xml) mapping. [20],[21] The Algorithms Used Naive Bayes The fundamental assumption to attribute independence was considered in this study. Dechallenge attribute presented in FDA records were taken as class label for detection. NB theorem given in Formula 4 had been used to calculate the probability of an outcome. The class label attribute dechallenge had two distinct values (Yes, No) represented by hypothesis (H). [INLINE:2] P(H/X) is the posterior probability where hypothesis (H) represents the presence of dechallenge with X as known disease category, drug code and outcome. P (X/H) is the posterior probability of X on the subject of H. P (H) is the prior probability of H regardless of disease category, drug code and outcome. P(X) is the prior probability of X. For calculating the prior probability P (X), dechallenge record sets with 'unknown' category were filtered. Then the posterior probability was calculated based on outcome, disease category and drug code. The data set of 2011 and 2012 records contained the constraints mentioned for the failure of Naive Bayes classifier. To overcome this, NB + algorithm [7] proposed by Balamurugan et al., was applied in the present study as detailed below. Improved Naive Bayes This algorithm starts with the Influence Factor as the first step to determine the dependability of an attribute value on the class attribute. Influence Factor was calculated for the attributes drug code, disease category and outcome on the class label dechallenge. Formula 5 was used to calculate the Influence factor. [INLINE:3] Where I(X/Ci)=Influence Factor N(X\C i ) =Number of records in which attribute value X had the class label C i and N(Ci)=Total Number of records in which the class label were C i .The dataset was divided based on the class label dechallenge 'Yes' and 'No'. Influence factor for attributes with high values were taken and others ignored. Results It is observed from [Table 1], Influence Factor is high for outcomes such as HO and LT, drug code with code 1 and disease category such as gastrointestinal disorder. The value of dechallenge is 'Yes' for combinations like gastrointestinal disorder with outcome as HO and LT and for drugs with code 1. Hence classifying the dechallenge for unknown records with same combinations of attributes can be predicted as 'Yes'. Experiments with 10fold cross validation have been carried out to evaluate the accuracy. Performance analysis of NB and NB + provided in [Table 2], where the average accuracy of NB + is 90.11 and NB is 70.25, average error of NB is 19.8 percent higher than NB +],[ and precision of NB + is 7.4 percent higher than NB. It is observed from experimental result, NB + performs well in case of attributes with categorical values and zero probability issue. In the ROC graph shown in [Figure 1], although NB fits in the category of Liberal performance with True Positive Rate as (.7, .8, .9, .9), there is also substantial number of False Positive Rate (.6, .7, .6, .5); whereas NB + fits in Perfect performance with True Positive Rate as (.9, .9, .9, .9), and minimal False Positive Rate as (.2, .8, .0, 0). Records from 2011 to 2012, classified as "unknown" by NB, are predicted by NB + using influence factor analysis.{Figure 1}{Table 1}{Table 2} Discussion The FDA uses data mining to screen the AERS database using Bayesian protocol for the presence of disproportionality in large adverse eventdrug product pairs, [22],[23] but the data must be evaluated to determine causality reviews like dechallenge. The performance of any datamining algorithm depends on the type of attributes and its application. A common means of identifying the association between drug and disease in pharmacovigilance is through disproportionality analysis. This produces the results based on 2 × 2 tables as there are relevant drug and ADR combination. Hence for large amount of data, this method will produce more number of tables which reduces the effectiveness of the approach. It is essential to apply extra mathematical formula in data forecasting methods of pharmacovigilance. [12] Several studies reported the need of data mining algorithms to review the data to make authoritative conclusion.Many studies reported the usage of mining algorithms like Proportional Reporting Ratio, Multiitem GammaPoisson Shrinker. Further investigation of statistical methods to analyze large amount of data is essential to improve the effectiveness of pharmacovigilance activities. [13] Hence in this study the data mining algorithms like NB and NB + for determining the performance of algorithms in enormous data for detecting dechallenge have been investigated. Among the 26 SOC disease categories, [21] the results presented here are based on 3 SOCs such as gastrointestinal, metabolism and nutrition, renal and urinary disorders. When NB is used to detect the dechallenge, the posterior probability is zero for records with outcome as 'CA' and disease category as gastrointestinal disorder. Hence 72 records of the year 2011 fourth quarter are classified as 'unknown' by NB algorithm. The algorithm fails when the probability of a particular outcome or disease is uniformly distributed. When NB is used, such unknown samples will become inadequate for future analysis. This inadequacy measure may cause fault in the detection. All the records stored in SRS need to be analysed for carrying pharmacovigilance activities. NB + resolved zero probability issue by determining attributes with High Influence Factor and thus reducing the noise present in data for effective detection of dechallenge. NB + can be applied to any dataset which suffers from zero probability exertion. From the experimental analysis, it is clear that NB + can be used for large data set in detecting causality reviews. For the FDA record sets, NB + produced higher accuracy than NB and detected the dechallenge value as "Yes " for drugs with code 1 and gastrointestinal disorder with outcome as HO and LT.Influence Factor Analysis in NB + proved the usage of this algorithm in pharmacovigilance for predicting unknown samples. Conclusions Most data available in FDA and World Health Organisation (WHO) have neither brought in health science education, nor trained to utilise for patient care purpose. Postmarketing surveillance techniques like detecting dechallenge will help the practitioners and prescribers to gain knowledge about drugs with various reactions. Hence unknown samples should be classified properly for data analysis. FDA suggests the evidence of dechallenge as the most important criteria for causality reviews. The outcome of the classification algorithms show that NB + outperformed NB in traditional interesting measures like accuracy and minimal error in classifying dechallenge. References


