This page includes two type of data: (i) Nucleosome positioning data and (ii) Features extracted from them.
The first group of data was obtained from MNase-seq data associated with 7 lymphoblastoid cell lines
(
GSE36979).
For generating the nucleosome positioning data, 3.6 MNase-seq fragments have been mapped
to the human genome (hg19) based on a protocol introduced by
Oriol et al.
Compared to other nucleosome positioning data, the acquired one has a higher resolution, providing an
opportunity to generate a practical machine learning-based nucleosome positioning prediction model.
Download the nucleosome positioning data as well as hg19
(zip-format file 1 GB)
The second group of data was extracted from the generated nucleosome positioning data (obtained from the first
group). To this end, first, 201-bp sequences were chosen, in which position 101 shows the dyad position. Second,
2360 features, described in the downloadable files, were extracted (the true positive dataset). Besides, 324,277
DNA sequences, not overlapping with the true positive dataset, were selected as the true negative dataset. In
addition, to ensure that the prediction model does not bias to the true positive dataset, 340,104 samples were
considered from them. Both the true negative and positive data cover all regions of the human genome.
Download the extracted features (zip-format file 651 MB)