Zika Virus Mutation and The Spreading to Indonesia

More than 13 countries in the Americas have reported sporadic Zika virus infection that show very rapid geographic expansion. While in Indonesia, the euphoria is also increasingly prevalent virus discussed especially after the discovery of Jambi positive patients infected with the virus Zika on January 26, 2016 last.Viruses transmitted by mosquito bites to humans are the same mosquito transmits dengue fever, chikungunya and yellow fever with symptoms that have similar resemblance to the zika virus. Based on similarity of symptoms of infection with dengue virus, can be analyzed using sequence alignment to get identical percentage, local alignment calculation, genetic mutation and the spread of the zika virus to Indonesia. From the whole process, Smith Waterman algorithm can be used to align dengue virus type 1, type 2, type 3, and type 4 with zika virus (jambi). Mutations between dengue virus and zika virus on average 28% of dissimilarity sequences in the same position between the sequence. Overall, the dengue virus type 1 mutation to the zika virus was 28.2723%, the dengue virus type 2 mutation to the zika virus was 28.1984%, the dengue virus type 3 mutation to the zika virus 27.9373% and the mutation of dengue virus type 4 to the zika virus of 28.7206%. By looking at all mutations, from the simulation results note that the mutations of both viruses include mutation type I. The phylogenetic tree showed the spread of the Zika virus to Indonesia, originally from South Africa, the islands of Chile, Caledonia, the Philippines, Yap Micronesia, Thailand, Cambodia, and finally reached Indonesia. Zika virus jambi suspected of dengue virus mutation because a few years earlier there was a dengue virus outbreaks in a long time in jambi, it turns out the virus zika jambi is not from mutation dengue virus but the virus comes from the Asian region.


I. INTRODUCTION
T HE Zika virus was identified in 1947 in rhesus monkeys and was identified in humans in 1952 in Uganda and the Republic of Tanzania.Viruses transmitted by mosquito bites (Aedes especially Aedes aegypti) to humans are the same mosquito transmits dengue fever, chikungunya and yellow fever with symptoms that have similar resemblance to the Zika virus that is fever, skin rash, conjunctivitis, muscle and joint pain, malaise and headache.Meanwhile, the Zika virus outbreak was first reported from the Pacific in 2007 and 2013 (Yap and Polynesia), and in 2015 from the Americas (Brazil and Colombia) and Africa (Cabo Verde) until the virus reached Indonesia (Jambi) at 26 January 2016.
Based on similarity of symptoms of infection with Dengue virus, can be analyzed using sequence alignment from Zika virus sequence with Dengue virus to get identical percentage, local alignment calculation and genetic mutation that happened.Sequence Alignment which is a procedure for aligning two sequences of DNA or proteins in order to find common ground between the sequences.Sequence alignment has two methods, namely global alignment and local alignement.Global alignment is the alignment for the entire sequence, using as many characters as possible in DNA.Local alignment is partial alignment of sequences, taking part that has a high enough level of resemblance.One of the local alignment algorithms is Smith Waterman.Although it looks simple for the development of algorithms based on dynamic programming, this algorithm is very instrumental in bioinformatics.Next, constructing phylogenetic trees of zika virus to know the spread of virus zika to Indonesia using MUSCLE.

A. DNA and Protein
DNA (Deoxyribo Nucleic Acid) is a macro molecule composed by nucleotides as the basic molecules that carry the properties of genes.DNA is formed by four types of nucleotides that are covalently bonded and represented by the letter A (adenine), C (cytosine), G (guanine), T (thymine).
While proteins are formed from simple molecular chains called amino acids (aa).The final shape of the protein is determined by the proper identity, the amino acid chain sequence, and is largely the atomic interaction between amino acids and the cell medium (mostly water).There are 20 amino acids used to form proteins that are A, R, N, D, C, Q, E, G,  H, I, L, K, M, F, P, S, T, W, Y, V. [1].
The symbol is a unique sequence determined by gene encoding of the gene consisting of three sets nucleotides are called codon, such as: AAA and AAG representing K, AGA and AGC representing R, GAA and GAG representing E, etc.

B. Sequence Alignment
DNA sequence, RNA Sequence and protein sequence are commonly determined based on biological sequence.At stated in Shen [2], biological sequence described with the following notation: where X,Y, Z denote sequence, and x i , y i , z i are basic units of sequence at i-position.Those elements are obtained from the set V q = {0, 1, ..., q − 1}.The length of X,Y, Z are expressed by n x , n y , n z respectively.If X,Y, Z are DNA/RNA sequence, then V 4 = {a, c, g,t} or {a, c, g, u}, whereas if the protein sequence, then q = 20 and V q = {A, R, N, D,C, Q, E, G, H, I, L, K, M, F, P, S, T,W,Y,V } which directly represent 20 amino acid molecules.
Sequence alignment in Shen [2], is a method of the position analysis and the type of mutation that are important in biological sequence that enables the comparison properly.Alignment between the two Multiple Sequence Alignment.Determining the movement of mutation is the core idea of sequence alignment.

C. Smith Waterman Algorithm
This research applies Smith Waterman's algorithm which is based on a local alignment algorithm.Two important aspects of Smith Waterman's Algorithm are: • Calculates values in a two-dimensional table.Smith Waterman's algorithm add 0 value when computing s(i, j) so that a negative score will never happen to this algorithm.
• Traceback Algorithm.Starting and ending points of the backtrace method on the Smith Waterman's algorithm selected with maximum scores.The end point is the first element with 0 value on the backtrace process.A starting point with maximum score will guarantee maximum scores on local alignment sequences and the end point is the first element with a 0 value ensuring that the section is not exceeded.

D. Genetic Mutation
Mutations are a change in genetic sequence, which is a major cause of differences among organisms.These changes occur on many levels, with very different consequences [3].DNA sequences mutation can be classified into 4 types [2]: 1) Type I : there is a nucleotide change, for example from "a" to "g". 2) Type II : there is a nucleotide section that changes the order of its position, for example the "accgu" section changes the sequence to "guacc".3) Type III : the insertion of a new segment into the sequence, for example the insertion of "aa" in the middle of the "gguugg" segment will change the segment to "gguaaugg".4) Type IV : there is elimination of the nucleotide segment in the sequence, for example removes the "ag" nucleotide from the "acaguua" segment so that the segment changes to "acuua".In the type I and type II mutations, the positions of all nucleotides do not change, these mutations are called substitution mutations.As for type III and type IV mutations that can alter the nucleotide position, it is referred to as a transfer mutation.

A. Data Source
The data sequences of virus taken online in genbank, which the world's largest gene database belonging to the United States government by accessing the National Center for Biotechnology Information [4].The sequence of each virus stored in FASTA code (.txt) format for the alignment process and construction of phylogenetic tree.In this research, we used 19 sample data protein sequence of zika virus, 4 sample data protein sequence of dengue type 1, type 2, type 3, and type 4 originating from Indonesia.Furthermore, 1 virus sequence was taken from Indonesia (Jambi) and aligned with the dengue virus sequence of each type, which is also taken from Indonesia.The dissimilarity and similarity of each sequences and percentages, duration process, and mutations between sequence could be see from the data analysis.The process is statically because the aligned sequences are only 4 dengue sequences with 1 zika sequence.Overall the process is simulated in matlab.The protein sequence data taken online in genbank are shown in Tables I and II respectively.

B. Sequence Alignment Result and Analysis
The data sequence of patients infected by zika virus (Jambi) and data sequence of dengue virus proteins in the table 1 and 2 are stored in (.txt) file inputed for sequence simultaneous alignment process in matlab using the Smith Waterman algorithm, the alignment between Zika virus (Jambi) and dengue virus of each type are shown in Table III.
Based on the output four alignments, and by comparison with the Basic Local Alignment Search Tool (BLAST), a program to compare the sequence of nucleotides or proteins to Proven by level of accuracy in matlab simulation output to 4 decimal numbers, while the BLAST shows only 2 significant figures and also the duration of computation time on matlab simulation is shorter than BLAST.
Before constructing phylogenetic trees, each sequence is aligned first.Based on 19 data sequences, the data types are not the same, some use protein data and also DNA data.So most of the DNA data is converted into protein data and accessed following the sequence of protein sequences.So the whole protein sequence data Zika virus is shown in Table V below.A few of identical percentage matrix in each sequence obtained by output MUSCLE (clustal w2) are shown in Fig. 1.It can be seen that the sequences has high identical value.It shows that the sequence similarity is also very high.And for the phylogenetic tree by MUSCLE shown in Fig. 2.
Based on Fig. 2, it is known that the Zika virus separated into two clusters.For more details, this information can be seen the following Table VI.

Fig. 1 .
Fig. 1.Some elements of the identical percentage matrix.

TABLE I PROTEIN
SEQUENCE DATA OF INFECTED DENGUE VIRUS PATIENT.

TABLE II PROTEIN
SEQUENCE DATA OF INFECTED ZIKA VIRUS PATIENT.sequencedatabase and calculate the statistical significance of the two sequence matches are shown in TableIV.TableIVabove, showing the identical value of the matlab simulation results is more thorough than the BLAST output. the

TABLE III MATLAB
RESULT BASED SMITH WATERMAN ALGORITHM.