NuPose: Nucleosome positioning based on DNA sequence
This page includes two type of data: (i) Nucleosome positioning data and (ii) Features extracted from them. The first group of data was obtained from MNase-seq data associated with 7 lymphoblastoid cell lines (GSE36979). For generating the nucleosome positioning data, 3.6 MNase-seq fragments have been mapped to the human genome (hg19) based on a protocol introduced by Oriol et al. Compared to other nucleosome positioning data, the acquired one has a higher resolution, providing an opportunity to generate a practical machine learning-based nucleosome positioning prediction model.

Download the nucleosome positioning data as well as hg19 (zip-format file 1 GB)


The second group of data was extracted from the generated nucleosome positioning data (obtained from the first group). To this end, first, 201-bp sequences were chosen, in which position 101 shows the dyad position. Second, 2360 features, described in the downloadable files, were extracted (the true positive dataset). Besides, 324,277 DNA sequences, not overlapping with the true positive dataset, were selected as the true negative dataset. In addition, to ensure that the prediction model does not bias to the true positive dataset, 340,104 samples were considered from them. Both the true negative and positive data cover all regions of the human genome.

Download the extracted features (zip-format file 651 MB)




About Panchenko's lab

We study the associations between key components in the epigenome to understand how its perturbation can lead to cancer. Our team works to identify factors contributing to cancer mutation occurrence in DNA, to discover molecular mechanisms of how mutations and covalent modifications affect nucleosomes and chromatin, their interactions, stability and dynamics.

Go to Panchenko lab