gene fusion variants mapped by shared PubMed IDs


A gene fusion occurs when parts of two genes’ RNA combine to form one hybrid mRNA molecule before translation into protein. A common set of fusions found in lung cancer are the multiple combinations of genes EML4 and ALK [1], hereafter denoted EML4-ALK. The COSMIC database [2] lists 29 distinct fusions of EML4-ALK, which all vary by where the breakpoint between the EML4 RNA sequence and the ALK RNA sequence occurs, as shown in the following screenshot:


Each of the fusions in the above screenshot corresponds to one or more PubMed IDs, which indicate papers that provide evidence for the fusion. We can map the PubMed IDs in common between the fusions as a chord chart (below).


Mapping COSMIC’s EML4-ALK gene fusions by shared PubMed IDs yields:


Here we see that fusion COSF474 shares many PubMed IDs with fusion COSF412, but few with fusion COSF493.


The raw data fed to the chord chart looks like:


I used D3 (Data-Driven Documents) [3], a JavaScript library for producing data-heavy graphics, to create the image. Particularly, I modified the chord diagram shown at [4] to accommodate this data.


  1. Soda M, Choi YL, Enomoto M, et al. (August 2007). “Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer”. Nature 448 (7153): 561–6. doi:10.1038/nature05945. PMID 17625570
  2. Forbes et al. “COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer”. Nucl. Acids Res. (2011) 39 (suppl 1): D945-D950. doi: 10.1093/nar/gkq929

Post Author: badassdatascience

Leave a Reply

Your email address will not be published.