A. Nandy, C. Raychaudhury, S. C. Basak
Some toxic substances are known to bind preferentially to specific segments in a DNA sequence while others such as copper preferentially affect DNA constituents such as guanine. In view of the complex nature of large molecules such as DNAs and the possibility of these toxic chemicals affecting homologous segments, techniques for identifying the possible DNA sites that may be affected assume significance. While chemical and laboratory tests remain the basic tools of such investigations, rapid computer-based searches that may help to minimise the possible locations of such toxic damages would be a useful supplement. We have observed using a two-dimensional Cartesian representation technique for DNA sequences that individual genes have unique graph signatures. These arise from the base arrangements and distribution in the coding sequences that are specific to each gene type and therefore have characteristic graphical representations. For conserved genes the entire gene including coding and non-coding regions are seen to have unique shapes. The characteristic shapes retain their shape similarity for homologous genes making visual identification possible from a library of gene graph signatures. Use of automatic pattern recognition programs can make such identifications simple and fast and lead to more efficient scanning of long DNA sequences. A quantitative indexing scheme recently proposed by us estimates the dispersion of the graphical representation and provides quantitative estimates of graph similarity. This will help to further narrow down the possible matches against the library catalogue. The basic benefit we derive from a two-dimensional graphical technique is that minor deviations in base sequences do not alter the characteristic shape and hence homologous sequences generate shape similarity, which we may term as shape homology; identification of homologous sequences in the normal character-based representation present a more formidable problem. In the case of effects of toxic substances on specific DNA sequences, we can trace the pattern of the sequence segment in the two-dimensional representation, and homologous sequences that have shape homology with this pattern can be expected to be receptive to such toxic chemicals. A rapid visual or computer search of a complete DNA sequence for such specific patterns can lead to identification of possible sites which can then be investigated further by more rigorous means. As another example of the use of these graphs, high levels of copper toxicity leading to depletion of guanine component of a DNA sequence will show up in these graphs as a compression along the horizontal axis There are many other information that can be read off from these graphs that make them a useful tool in analysis of DNA sequences; several of these could be profitably employed in the search for effects of toxic substances. Systematic differences are seen in the characteristic shapes for a set of homologous conserved genes, and these can be used as a guide for estimating significant changes. Closer inspection of the graphical shapes provides indications of local base dominances and evidences of repetitive segments and therefore possible extent of damages that may accrue from high toxicity levels. Thus the graphical technique provides to a first approximation a new and useful predictive and diagnostic tools.