深度基因组学
深度基因组学是把深度学习应用到基因组学里的学问
==== |
====== |
========= |
======= |
===== |
======= |
==== |
==== |
========== |
============== |
========= |
|
DeepCNV: a deep learning approach for authenticating copy number variations We propose a deep learning approach to remove the false positive CNV calls from SNP array and sequencing CNV detection programs. This repo constains the model code and an executable script with five sample inputs. Since the pre-trained model file exceeds the upload size of Github, it can be accessed by this external link. The dataset of this project is not for public. blended_learning.py is the training script. You can feed your own dataset to train the model using blended_learning.py. |
Github |
---|---|
Deep learning applications in single-cell genomics and transcriptomics data analysis |
|
keras_dna: a wrapper for fast implementation of deep learning models in genomics, Bioinformatics, 2021 Keras_dna is an API that helps quick experimentation in applying deep learning to genomics. It enables quickly feeding a keras model (tensorflow) with genomic data without the need of laborious file conversions or storing tremendous amount of converted data. It reads the most common bioinformatics files and creates generators adapted to the keras models. |
Github |
Classifying human DNA sequence and random ATCG sequences, using keras CNN This is a small tutorial for my lab members, on how to apply deep learning technology in analyzing DNA genome sequences. I have create an ipython notebook for the analysis pipeline. To my surprise, the simple network achieves 99% accuracy in classifying DNA sequences from random generate sequences. A deep neural newtork may probably learn much more from our genome than us. 其中的核心代码文件的代码版本过低,无法执行, 修改后的代码在这里。
|
|
This software is able to train sequence classification models and use them to make predictions. Before following these instructions, make sure you've installed the software. If you followed option 1 above and the command kameris doesn't work for you, try using python -m kameris instead. If you followed option 2 above and downloaded an executable, replace kameris in the instructions below with the name of the executable you downloaded |
stephensolis/kameris |
Analysis of DNA Sequence Classification Using Neural Networks. This project is the implementation of this research article (Analysis of DNA Sequence Classification Using CNN and Hybrid Models). In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed MLP using Label and K-mer encoding for DNA sequence classification. In this project (Bioinformatics Course Project), we will classify 6 viruses with MLP. the genome of each virus is shown by nucleotide sequences that have different lengths. Adenine (A), cytosine (C), guanine (G), and thymine (T) are the four nucleotides that make up DNA. The DNA of each virus is unique, and the pattern of arrangement of the nucleotides determines the unique characteristics of a virus. First, the K-mer method was used to reduce the length of the DNA sequence, and then the Word to Vector method was used to convert it to a fixed length. |
arminZolfaghari/DNA-Sequence-Classification |
I decided to re-do this project but with a pre-trained model (DNA-BERT) found at this repository: https://github.com/jerryji1993/DNABERT. I have used HuggingFace library for loading the model, training, and evaluating it. The pre-trained easily and quickly beats my last year's implementation which was using the Tensorflow MultiHeadAttention module. The experiments from the last year are in the old_code folder now. I used the Weights and Biases library for logging the results of the training and as shown below in only 2 epochs the model reaches an f1 score of 0.99. |
Moeinh77/Virus-DNA-classification-BERT |
This is a recreation of the paper: Nguyen, Ngoc Giang, et al. "DNA Sequence Classification by Convolutional Neural Network." Journal of Biomedical Science and Engineering 9.05 (2016): 280. |
tariqul-islam/DNA-Sequence-Classification-using-CNN |
1. This repository includes the implementation of 'DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome'. Please cite our paper if you use the models or codes. The repo is still actively under development, so please kindly report if there is any issue encountered. |
|
In this notebook, we will classify human and viral DNA with Deep Learning using TensorFlow 2. At the end of this tutorial, our model will reach approximately 90% accuracy.
|
draaslan/viral-dna-classification |
ViraMiner: Deep Learning for identifying viral genomes in human samples Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases |
NeuroCSUT/ViraMiner |
DNA-seq analysis with deep learning using Keras (tensorflow backend) in High performance computing (HPC) environment. DeepDNAseq makes a binary classification of the input DNA sequence after being trained with 2047 training samples.
|
Akmazad/DeepDNAseq |
DNA sequence prediction using the DeepSea machine learning model (Tensorflow + Keras API) | hwilliam1/DeepSea_DNA |
End to end simplest model for dna random embedding and cnn based classification in keras with tensorflow backend.
|
bharat3012/Its-DNA-Classification |
An image representation based convolutional network for DNA classification This is the code for ICLR paper An image representation based convolutional network for DNA classification. It can run with single Titan X GPU |
Doulrs/Hilbert-CNN |
深度基因组学是把深度学习应用到基因组学里的学问