====
Paper
====

======
Python
=====
=

=========
Tensorflow
=========

=======
PyTorch
=======

=====
Keras
=====

=======
Topics
=======

====
Link
====

====
Video

====

==========
Drug Design

==========

==============
Material Science
==============

=========
Economics
=========
=


============
Deepgenomics
===========
=




DeepCNV: a deep learning approach for authenticating copy number variations
We propose a deep learning approach to remove the false positive CNV calls from SNP array and sequencing CNV detection programs. This repo constains the model code and an executable script with five sample inputs. Since the pre-trained model file exceeds the upload size of Github, it can be accessed by this external link. The dataset of this project is not for public. blended_learning.py is the training script. You can feed your own dataset to train the model using blended_learning.py.



Github

Deep learning applications in single-cell genomics and transcriptomics data analysis


 
keras_dna: a wrapper for fast implementation of deep learning models in genomics, Bioinformatics, 2021
Keras_dna is an API that helps quick experimentation in applying deep learning to genomics. It enables quickly feeding a keras model (tensorflow) with genomic data without the need of laborious file conversions or storing tremendous amount of converted data. It reads the most common bioinformatics files and creates generators adapted to the keras models.



Github

Classifying human DNA sequence and random ATCG sequences, using keras CNN

This is a small tutorial for my lab members, on how to apply deep learning technology in analyzing DNA genome sequences. I have create an ipython notebook for the analysis pipeline. To my surprise, the simple network achieves 99% accuracy in classifying DNA sequences from random generate sequences. A deep neural newtork may probably learn much more from our genome than us.

其中的核心代码文件的代码版本过低,无法执行, 修改后的代码在这里。


 

/deep_learning_DNA

This software is able to train sequence classification models and use them to make predictions.

Before following these instructions, make sure you've installed the software. If you followed option 1 above and the command kameris doesn't work for you, try using python -m kameris instead. If you followed option 2 above and downloaded an executable, replace kameris in the instructions below with the name of the executable you downloaded



stephensolis/kameris

Analysis of DNA Sequence Classification Using Neural Networks. This project is the implementation of this research article (Analysis of DNA Sequence Classification Using CNN and Hybrid Models.

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed MLP using Label and K-mer encoding for DNA sequence classification.

In this project (Bioinformatics Course Project), we will classify 6 viruses with MLP. the genome of each virus is shown by nucleotide sequences that have different lengths. Adenine (A), cytosine (C), guanine (G), and thymine (T) are the four nucleotides that make up DNA. The DNA of each virus is unique, and the pattern of arrangement of the nucleotides determines the unique characteristics of a virus.

First, the K-mer method was used to reduce the length of the DNA sequence, and then the Word to Vector method was used to convert it to a fixed length.










arminZolfaghari/DNA-Sequence-Classification

I decided to re-do this project but with a pre-trained model (DNA-BERT) found at this repository: https://github.com/jerryji1993/DNABERT. I have used HuggingFace library for loading the model, training, and evaluating it. The pre-trained easily and quickly beats my last year's implementation which was using the Tensorflow MultiHeadAttention module. The experiments from the last year are in the old_code folder now.

I used the Weights and Biases library for logging the results of the training and as shown below in only 2 epochs the model reaches an f1 score of 0.99.



/Virus-DNA-classification-BERT

This is a recreation of the paper:

Nguyen, Ngoc Giang, et al. "DNA Sequence Classification by Convolutional Neural Network." Journal of Biomedical Science and Engineering 9.05 (2016): 280.


tariqul-islam/DNA-Sequence-Classification-using-CNN
   

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

1. This repository includes the implementation of 'DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome'. Please cite our paper if you use the models or codes. The repo is still actively under development, so please kindly report if there is any issue encountered.
In this package, we provides resources including: source codes of the DNABERT model, usage examples, pre-trained models, fine-tuned models and visulization tool. This package is still under development, as more features will be included gradually. Training of DNABERT consists of general-purposed pre-training and task-specific fine-tuning. As a contribution of our project, we released the pre-trained models in this repository. We extended codes from huggingface and adapted them to the DNA scenario.

2. DNABERT-2 is a foundation model trained on large-scale multi-species genome that achieves the state-of-the-art performanan on 28 tasks of the GUE benchmark. It replaces k-mer tokenization with BPE, positional embedding with Attention with Linear Bias (ALiBi), and incorporate other techniques to improve the efficiency and effectiveness of DNABERT.


 

 

/DNABERT

 

 

 

/DNABERT_2

In this notebook, we will classify human and viral DNA with Deep Learning using TensorFlow 2. At the end of this tutorial, our model will reach approximately 90% accuracy.
draaslan/viral-dna-classification
ViraMiner: Deep Learning for identifying viral genomes in human samples
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases

/ViraMiner
DNA-seq analysis with deep learning using Keras (tensorflow backend) in High performance computing (HPC) environment. DeepDNAseq makes a binary classification of the input DNA sequence after being trained with 2047 training samples.
/DeepDNAseq
DNA sequence prediction using the DeepSea machine learning model (Tensorflow + Keras API) /DeepSea_DNA
End to end simplest model for dna random embedding and cnn based classification in keras with tensorflow backend.
/Its-DNA-Classification

An image representation based convolutional network for DNA classification

This is the code for ICLR paper An image representation based convolutional network for DNA classification. It can run with single Titan X GPU



Doulrs/Hilbert-CNN
   
   
   
   

 

 

 

 

深度基因组学

深度基因组学是把深度学习应用到基因组学里的学问

上海市浦东新区沪城环路999号