Skip to main content
Entropy logoLink to Entropy
. 2021 Nov 27;23(12):1592. doi: 10.3390/e23121592

A Quaternary Code Correcting a Burst of at Most Two Deletion or Insertion Errors in DNA Storage

Thi-Huong Khuat 1,, Sunghwan Kim 1,*,
Editor: Matteo Convertino1
PMCID: PMC8699998  PMID: 34945898

Abstract

Due to the properties of DNA data storage, the errors that occur in DNA strands make error correction an important and challenging task. In this paper, a new code design of quaternary code suitable for DNA storage is proposed to correct at most two consecutive deletion or insertion errors. The decoding algorithms of the proposed codes are also presented when one and two deletion or insertion errors occur, and it is proved that the proposed code can correct at most two consecutive errors. Moreover, the lower and upper bounds on the cardinality of the proposed quaternary codes are also evaluated, then the redundancy of the proposed code is provided as roughly 2log48n.

Keywords: DNA storage, quaternary code, deletion error, insertion error, consecutive errors

1. Introduction

In recent years, because of its huge capacity and excellent durability, deoxyribonucleic acid (DNA) storage is becoming attractive for future long-term data storage [1,2,3]. However, during the processes of DNA storage, the molecule can be faced with errors that do not normally occur in traditional storage devices such as deletion and insertion errors [4]. Therefore, research to address deletion and insertion errors is extremely significant in DNA storage, and error-correcting codes for the errors have been studied. Our work focuses on the codes capable of correcting multiple deletion or insertion errors in DNA storage.

For correcting one deletion or insertion error in binary codes, Varshamov–Tenengolts (VT) codes were first proposed in [5] and in the same year the modified VT code construction was provided in [6] to correct a single deletion, insertion or substitution error. Shortly thereafter, to deal with more than a single error, Levenshtein extended the VT code to a binary code that can correct at most two consecutive deletion or insertion errors [7]. In [8], a binary codeword was arranged as an array with b rows and each row was a binary VT codeword so that this construction could correct a burst of the size of exactly b deletion or insertion errors (with any fixed b2). Then, the authors of [9] proposed a binary shifted-Varshamov–Tenengolts (SVT) code to obtain an improved construction which still corrects exactly b errors but with a lower redundancy than one in [8]. From the obviously efficient correction and low redundancy of the VT codes, the authors in [10,11] proposed a method of the linear-time encoders to implement the binary VT code which satisfies the homopolymer run and Guanine-Cytosine(GC)-content constraints [12,13] among important properties of a DNA strand. However, the binary VT codes used in these linear-time encoders correct a single nucleotide of a DNA strand. With a similar approach as [10,11], but to correct a burst of size exactly b deletions or insertions of DNA symbols, the authors of [14] applied the encoder of the binary modified VT code in [6] and binary SVT codes in [9]. Then, by interleaving bits of binary VT codewords and binary SVT codewords, the work [9] obtained a binary code construction that can correct a burst error of size exactly 2b, and finally, the codeword of this construction was translated to DNA symbols.

A non-binary VT code was first proposed in [15], and a non-binary SVT code was proposed in [16]. The codes were defined over a q-ary alphabet for any q>2. With the similar property of the binary codes, the q-ary VT and q-ary SVT codes can correct a single deletion or insertion symbol. To correct multiple errors, the construction in [8,9] can be applied to obtain a q-ary code that can correct a burst of size exactly b of deletion or insertion errors. However, designing q-ary VT codes that can correct multiple deletion or insertion errors has been an interesting problem [17]. Recently, there were some works [18,19,20,21] focused on code design to correct exact multiple errors but the efficient design for q-ary codes (or even quaternary codes) that can correct a burst of at most b deletion or insertion errors is still an open problem. The authors of [22] proposed a non-binary code correcting at most two consecutive deletions with redundancy logn+logqlog(logn+6)+log6+3. In [22], the authors used the construction method in [9] with one binary code in [7] and a modified of it in interval P. However, we propose a quaternary code which is suitable for robust DNA storage and can correct at most two consecutive deletion or insertion errors with the direct construction. Moreover, the redundancy of the proposed code is improved than [22].

As to the cardinality of VT codes, for about 50 years, a lower bound of size of the best class of VT codes can be achieved, but an upper bound is rarely provided even in binary case. The author in [23] used Mixed Integer Linear Programming (MILP) relaxation technique to obtain the tighter upper bound of the binary VT code, for example, with the length n = 11, the maximum size of one deletion code was calculated as 173. Moreover, the conjecture about maximum size of VT code for all n was also provided. However, in this work, we focus on the correction error capability of the proposed code design, then we use the previous methods in [7,15] to evaluate lower bound and upper bound of the proposed code design.

In our work, we have extended binary codes based on the results of [7], by adding two constraints to determine the exact values and positions of the errors in the quaternary sequence. By mathematically analyzing the possible cases of errors, we propose decoding algorithms to prove the error correction capability of this code design. We note that the main concern in this work is the error correction capability of quaternary code design, not focus on constraints in DNA storage. It is assumed that the combination design of error correction code and constraints of DNA storage was already done by other algorithms [11,24]. The main contributions in this paper can be summarized as follows.

  • We propose a quaternary code design that is suitable for the deletion or insertion channel, especially for mapping 0A, 1C, 2T, and 3G. This proposed design is directly applicable to sequencing in DNA storage. Furthermore, this proposed code can correct at most two consecutive deletion or insertion errors.

  • We propose two decoding algorithms for this proposed code to correct one deletion and two consecutive deletion errors. For the decoding of insertion errors, some differences between the deletion and insertion cases are shown and the important functions for correcting the insertion error are also presented in Appendix A.

  • We provide the lower bound and evaluate upper bound of the proposed code design. The redundancy of the proposed code design is also calculated to be at most 2log48n.

This paper is organized as follows. In Section 2, we list basic notations and definitions used in the rest of the paper and we briefly present previous binary and quaternary code constructions to correct one and two consecutive deletions. Then, Section 3 contains the proposed code construction, a proof of the correction capability, and the bounds of the cardinality for the proposed quaternary code. Section 4 provides a discussion and, finally, conclusion is presented in Section 5 of this paper.

2. Preliminaries and Previous Works

2.1. Notation and Definition

Let F2n and F4n be the set of binary and quaternary sequences of length n, respectively. Let a quaternary codeword with length n be defined as c = (c1,c2,,cn)F4n. Then, a modified sequence clnb of the sequence c is defined as clnb = (c1,c2,,cl2,cl1,cl+b,cl+b+1,,cn)F4nb, where cl,cl+1,,cl+b1 are deleted in c. Similarly, a sequence cln+b of the sequence c is also defined as cln+b = (c1,c2,,cl1,h1,h2,,hb,cl,cl+1,,cn)F4n+b, where h1,h2,,hb are inserted from l-th position in c.

For a binary sequence x = (x1,x2,,xn)Fn2, we can consider a sequence 0x with length n+1, where 0x = (0,x1,x2,,xn)F2n+1. For simplicity, the sequence 0x with length n+1 is regarded as having a starting value of x0 = 0. For example, a binary sequence x with length 10 is given as x = (0,0,1,0,0,1,1,1,1,1). For convenience, the binary sequence notation can be changed to x = 0010011111. In the rest of this paper, these two notations are used as the same meaning, so there is a binary sequence 0x, with length 11, as 0x = 00010011111. Then, the run-length vector r denotes the number of zeros and ones run-length in 0x. In addition, the binary sequence 0x is composed of four runs u0u1u2u3, which are u0= 000, u1=1, u2= 00, and u3= 11111. Herein, for a non-negative integer k, the zeros and ones runs are denoted u2k and u2k+1, respectively. Then, the run-length vector r of 0x is r = (r0,r1,r2,r3)= (3,1,2,5).

Let r be the total number of elements in the run-length vector r of 0x, corresponding to the total number of runs in 0x. Then, from the run-length vector, the run-syndrome of the binary sequence 0x is defined as

Rsyn(0x)=i=0r1iri. (1)

In the previous example, for 0x = 00010011111, since the run-length vector r is (3,1,2,5), Rsyn(0x) = i=041iri= 20.

If the j-th bit of 0x belongs to the m-th run um, we define k0x(j) as the index of the run and k0x(j)=m, for 1jn. Since the total number of elements of the run-length vector cannot exceed the length of 0x, r is bounded as

rn+1, (2)

where the equality is satisfied if the binary sequence 0x = 0101010⋯.

From the previous example, the binary sequence 0 = 00010011111, with length 11, has the run-length vector r = (r0,r1,r2,r3) = (3,1,2,5) and the total number of elements of the run-length vector of 0x is r = 4 <n+1= 11. For 1j10, since the third bit in 0x belongs to the run u1 =1, the index of the run which the third bit belongs to is k0x(3) = 1.

2.2. Previous Works

With the binary case, to the best of our knowledge, the VT code in [5] is the best code to correct a single deletion or insertion error and the modified VT code in [6] is the best code to correct a single deletion, insertion, or substitution error. To correct more than a single error, we briefly recap the binary code correcting at most two consecutive deletions from [7]. Moreover, we briefly present the deletion correction capability of the binary code in [7] when single deletion or two consecutive deletions occur.

Definition 1. 

For 0d2n1, the binary code C(n,2) in [7] with length n is given as

C(n,2)={gF2n:Rsyn(0g)d mod 2n}. (3)

The correction capability of the code in Definition 1 was also proved in [7]. From the length of the received sequences y, we can know that one or two consecutive bits are removed from the codeword g. If one deletion at the j-th bit or two consecutive deletions at the j-th and (j+1)-th bits occur, then y can be gjn1 or gjn2.

To determine the position of the deleted bit, we first calculate the difference of the run-syndrome as Δ = dRsyn(0y) mod 2n. If one deletion error occurs, Δ = dRsyn(0gjn1) mod 2n and if two consecutive deletions occur, Δ = dRsyn(0gjn2) mod 2n. These values are used to identify the value and position j of the deleted bit if there is one deletion in the codeword g or the values and positions j and j+1 of the two deleted bits in the case that two consecutive deletions occur in the codeword g.

However, in the quaternary case, there exists the code to correct a single deletion or insertion error. The overview of q-ary insertion and deletion-correcting codes with length n is briefly presented in Definitions 2 and 3. The VT code family, known as the set of the most basic codes for correcting a single deletion or insertion, is defined as follows [15]:

Definition 2. 

For 0a<n and 0e<q, the q-ary VT code with length n, VTa,e(n,q) is defined as

VTa,e(n,q)=Δ{cFqn:i=0n(i1)αia mod n (4)
i=0ncie mod q}, (5)

where α1=1 and αi=1,ifcici10,ifci<ci1 for 1<in.

From Definition 2, since the binary sequence α = (α1,α2,,αn) is strongly related to the q-ary sequence c, a deletion of the j-th symbol in the codeword c also leads to a deletion of the j-th bit in the binary sequence α. Hence, from the help of the binary sequence α, the q-ary sequence is finally corrected.

Similarly, the authors of [16] proposed a single deletion-correcting code that defined the q-ary SVT code.

Definition 3. 

For 0aP, 0e<q, and f{0,1}, the q-ary SVT code, SVTa,e,f(n,P,q), with length n is defined as

SVTa,e,f(n,P,q)=Δ{cFqn:i=1niαia mod (P+1) (6)
i=1ncie mod q (7)
i=1nαif mod 2}, (8)

where α1=1 and αi=1,ifcici10,ifci<ci1 for 1<in.

Compared to the construction of the q-ary VT code, since mod (P+1) is used in the constraint (6) in Definition 3 instead of mod n, the redundancy of the q-ary SVT code is reduced from logq(n+1) to logq(2P+2)+1. The constraint (8) is added to imply that the binary sequence α belongs to the binary SVT code. Hence, similar to the correcting method in Definition 2, the q-ary SVT code in Definition 3 can correct one deletion in any position.

However, the q-ary VT code in Definition 2 and the q-ary SVT code in Definition 3 correct only a single deletion or insertion error, but cannot correct consecutive deletions or insertions in the sequence. To solve this drawback, we can convert the idea in [8,9] about a construction for the binary codes correcting a burst of deletion or insertion errors with a size of exactly b for b2, into the q-ary case. The q-ary codeword c with length n is treated as a codeword array Ab(c) with size b×nb and the codeword is arranged column-by-column. Then, to reduce redundancy than in [8], the first row and each of the other (b1) rows in the codeword array are encoded by a q-ary VT code and q-ary SVT code, respectively. From this construction, one deletion or insertion error in each row can be corrected by the q-ary VT code or q-ary SVT code, such that a burst of b consecutive deletion or insertion errors can be corrected.

For example, for correcting a burst of deletions of size two, the q-ary codeword c with length n is presented as a 2×n2 array A2(c), which is given by

A2(c)=c1c3cn1c2c4cn. (9)

Since each row of A2(c) is protected by the q-ary VT code or SVT code with length n2, the code from A2(c) can correct exactly two consecutive deletions.

To sum up the previous statements, to correct one or exactly two consecutive deletion or insertion errors, we can use the q-ary VT and SVT codes. However, quaternary code to correct at most two consecutive deletion or insertion errors has not been developed. In the following section, we propose a new code design of quaternary codes suitable for the DNA storage and these codes can correct at most two consecutive deletion or insertion errors.

3. Proposed Code Design

This section provides a new design for a quaternary code to correct at most two consecutive deletions or insertions symbols. The construction of the proposed code is given in Section 3.1. Section 3.2 and Section 3.3 prove the correction capabilities of the presented code if one deletion occurs or two consecutive deletion errors occur, respectively. The decoding of insertion errors is presented in Section 3.4. The evaluation of a lower bound and an upper bound on the cardinality of the proposed code is derived in Section 3.5.

3.1. Code Construction

Exploring a new design for the quaternary code to correct one or two consecutive symbols, we explain the proposed code design as the following definition. In the proposed code design, the binary sequence which has the same length and related to the quaternary sequence is used to construct the constraints for the proposed code.

Definition 4. 

For 0an, 0d2n1, and 0e<4, a quaternary code C(n,4) has a codeword c = (c1,c2,,cn). First, we can consider a mapping from the quaternary codeword c to a binary sequence x = (x1,x2,,xn) for 1in as,

xi=0,ifci=0orci=11,ifci=2orci=3. (10)

Then, the quaternary code C(n,4) which satisfies the following three conditions can correct at most two consecutive deletion or insertion errors.

C(n,4)={cF4n:Rsyn(0x)dmod2n (11)
i=1niciamod(8n+1) (12)
i=1nciemod4}. (13)

The basic idea of the mapping (10) is that the quaternary codeword c corresponds to the binary sequence x with the same length n. Therefore, a deletion in the j-th position of the codeword c also leads to a deletion in the j-th position of the binary sequence x. For example, if the received sequence is y = cjn1F4n1, after using the mapping (10), we can obtain the binary sequence xjn1F2n1, which has one deletion error in the j-th position.

In Definition 4, the condition (11) is the same as the condition (3) in Definition 1 for C(n,2), which means that the sequence x is protected by a binary codeword of C(n,2). Therefore, decoding of the binary sequence x can be used for finding the positions of the deleted symbols and guessing the values of deleted symbols of codeword c.

The two constraints (12) and (13) in Definition 4, which are not in Definition 1, are used to obtain the correcting property in the quaternary regime. Since from constraint (11), the possible positions of the deletion errors can be obtained; however, in the case there is more than one value which satisfies the constraint (11), the constraints (12), (13) are used to remove invalid values of the possible positions. The constraint (13) is added to determine exactly the value of the deleted symbol and sum value of two consecutive deleted symbols. Then, finally the position and the value of symbols satisfy 3 constraints (11), (12) and (13) will be unique and the resulting quaternary sequence will be corrected. For example, n=10,d=0,a=0 and e=0, the binary sequence is corrected as x=110_0000111, the underlined bits are the bits which are inserted to correct x. From the mapping (10), the possible quaternary sequence can be c=030_0011322, c=020_0011322, c=031_0011322, or c=021_0011322. If there are no constraints (12), (13), the decoder cannot output the corrected quaternary sequence. Therefore, the constraints (12), (13) exclude the invalid quaternary sequences as described in Table 1, then the output is the unique sequence c=030_0011322.

Table 1.

Correction capability of constraints (12), (13) when two consecutive deletions occur.

Possible Corrected Quaternary Sequences Compared to a in Constraint (12) Compared to e in Constraint (13)
0300011322 O O
0200011322 X X
0310011322 X X
0210011322 X O

3.2. Decoding Procedure for One Deletion Error

It is assumed that a transmitter and receiver share the parameters n,d,a,e of the code C(n,4) in Definition 4. Then, we first consider a case that one deletion error occurs in the codeword c. For 1in, if the j-th symbol in c is removed, we obtain a received sequence y=cjn1F4n1, with length n1.

If the symbol at the j-th position is deleted, the constraint (13) can be rewritten as i=1j1ci+cj+i=j+1ncie mod 4. From the received sequence y=cjn1F4n1, the constraint is given as i=1n1yi=i=1j1yi+i=jn1yi=i=1j1ci+i=j+1nci. Thus, the value of the deleted symbol value cj is calculated as cj=ei=1,ijnci mod 4=ei=jn1yi mod 4.

Next, we need to find the deletion position j. From the mapping (10) for the received sequence y to acquire the binary sequence xjn1 with length n1, 0xjn1 is obtained as 0xjn1 = (0,x1,x2,,xj2,xj1,xj+1,xj+2,,xn). Then, the run-length vector r is determined from 0xjn1 and Rsyn(0xjn1) = i=0r1iri mod 2n in the constraint (11). As mentioned in Definition 1, when one deletion error occurs, the run-syndrome decreases by Δ = dRsyn(0xjn1) mod 2n.

To provide a proof for the correction capabilities of the proposed quaternary code in Definition 4, we develop Algorithm 1 as a correcting method in the case of one deletion symbol.

Algorithm 1 Correct one deletion symbol.
  • Input: 

    n,d,a,e,y=cjn1F4n1.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  •  1:

    cj=ei=jn1yi mod 4.

  •  2:

    Get the binary sequence 0xjn1 and the run-length vector r of 0xjn1.

  •  3:

    Get the total number of elements of r as r.

  •  4:

    Δ=dRsyn(0xjn1) mod 2n.

  •  5:

    Set j=1.

  •  6:

    while  jn  do

  •  7:

       if Δ<r then

  •  8:

         if k0xjn1(j1)=Δ then

  •  9:

            c=del_correct1(n,a,y,j,cj)

  •  10:

         else

  •  11:

            jj+1

  •  12:

         end if

  •  13:

       else

  •  14:

         if k0xjn1(j1)+2(nj)1=Δ then

  •  15:

            c=del_correct1 (n,a,y,j,cj)

  •  16:

         else

  •  17:

            jj+1

  •  18:

         end if

  •  19:

       end if

  •  20:

    end while

Function 1 provides function del_correct1 for Algorithm 1 to determine the deletion position, and then the output of Function 1 is the corrected quaternary sequence. In addition, in Function 1, Syn_new stands for the syndrome of the quaternary sequence after inserting the lost symbol cj in the j-th position of cjn1.

  • Function 1: 

    c = del_correct1 (n, a, y, j, cj)

  •  Input: 

    n, a, y, j, cj.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  • 1:

    Syn_new = i=1j1icj,in1+jci+i=j+1n1icj,in1 mod (8n + 1)

  • 2:

    ifSyn_new = a then

  • 3:

      c = c1,c2,…,cj−1,cj,cj + 1,…,cn

  • 4:

    else

  • 5:

            jj+1

  • 6:

    end if

Example 1: Let n,d,a, and e be 10, 0, 0, and 0, respectively. Assume that one deletion occurs at the sixth position of the codeword c = (0,3,0,0,0,1,1,3,2,2)F410. The received sequence y is y = cj9 = (0,3,0,0,0,1,3,2,2). As mentioned in Algorithm 1, the value of the lost symbol is cj = ei=1n1yi mod 4=1. From the mapping (10), we obtain the binary sequence xj9 = 010000111. Then, the run-length vector of 0xj9 is r=(2,1,4,3) so r = 4 and the run-syndrome of 0xj9 is Rsyn(0xj9) = i=051iri mod 20 = 18. The change of the run-syndrome is computed as Δ = 0Rsyn(0xj9) mod 20=2.

For 1jn, since Δ<r, following Algorithm 1, when j = 6, then Δ = k0xj9(j1) = 2. If inserting the lost symbol with cj = 1 in the sixth position of the received sequence as (0,3,0,0,0,1_,1,3,2,2), the syndrome of this quaternary sequence Syn_new = i=161ic6,i9+6.1+i=6+110ic6,i9 mod 81 = 0 (equals to a). Thus, the deletion error of the quaternary sequence is recovered correctly.

3.3. Decoding Procedure for Two Deletion Errors

Suppose that the received sequence y = cjn1F4n2 with length n2, where two consecutive symbols in the j-th and (j+1)-th positions of codeword cC(n,4) are deleted.

The constraint (13) in Definition 4 can be rewritten as i=1j1ci+cj+cj+1+i=j+2ncie mod 4, and it is easy to obtain as cj+cj+1 = ei=1,ij,ij+1nci mod 4, corresponding to cj+cj+1 = ei=1n2cj,in2 mod 4. Since i=1n2yi = i=1n2cj,in2, we can rewrite cj+cj+1 as cj+cj+1 = ei=1n2yi mod 4.

From the mapping (10) for the received sequence y, the binary sequence with length n1 is obtained as 0xjn2 = (0,x1,x2,,xj2,xj1,xj+2,xj+3,,xn). Then, the run-length vector r of 0xjn2 also can determine the run-syndrome of 0xjn2 as Rsyn(0xjn2) = i=0r1iri mod 2n. Thus, similar to the approach mentioned in Definition 1, the difference of the run-syndrome is computed as Δ = dRsyn(0xjn2) mod 2n.

To recover two deletion errors, we first recover the binary sequence x with length n from the binary sequence xjn2. The authors of [7] suggested the eight possible instances when two consecutive bits are deleted, as summarized in Table 2. However, in this work, we consider more instances which are 16 in total, and the remaining eight instances are listed in Table 3. Please note that in Algorithms 2 and A2, a notation xj=x¯j1 is used to imply that the reverse value of the (j1)-th position is assigned to the bit at the j-th position. Thus, two notations xj=x¯j1 and xjxj1 have the same meaning, and this means that two neighbor (j1)-th and j-th bits have different values. For example, if xj1=1, then xj=x¯j1=0 or xjxj1=0.

Table 2.

The eight possible instances in [7] for two consecutive deletion errors.

Conditions of (xj1,xj,xj+1,xj+2) in [7] (xj1,xj,xj+1,xj+2) in [7]
xj=xj1,xj+1=xj1,xj+2xj1 if j+2n (1,1,1,0)
(0,0,0,1)
xj=xj1,xj+1xj1,xj+2xj1 if j+2n (1,1,0,0)
(0,0,1,1)
xjxj1,xj+1xj1,xj+2=xj1 if j+2n (1,0,0,1)
(0,1,1,0)
xjxj1,xj+1=xj1,xj+2=xj1 if j+2n (0,1,0,0)
(1,0,1,1)

Table 3.

The eight possible instances are added in this work for two consecutive deletion errors.

Conditions of (xj1,xj,xj+1,xj+2) (xj1,xj,xj+1,xj+2)
xj=xj1,xj+1=xj1,xj+2=xj1 if j+2n (1,1,1,1)
(0,0,0,0)
xj=xj1,xj+1xj1,xj+2=xj1 if j+2n (1,1,0,1)
(0,0,1,0)
xjxj1,xj+1xj1,xj+2xj1 if j+2n (1,0,0,0)
(0,1,1,1)
xjxj1,xj+1=xj1,xj+2xj1 if j+2n (0,1,0,1)
(1,0,1,0)

In an analysis approach similar to [7], for 1jn1, there are four possible deleted bit pairs (xj,xj+1) = (0,0),(0,1),(1,0), and (1,1). Then, if j+2 n, we combine four possible cases of (xj,xj+1) with neighboring bits (xj1,xj+2) = (0,0),(0,1),(1,0), and (1,1), and we need to consider 16 instances of (xj1,xj,xj+1,xj+2).

From the above analysis, we develop Algorithm 2 for the proposed code to correct two consecutive deletion errors. In addition, in Algorithm 2, though it was not mentioned, the bit xj+2 is mathematically analyzed as an accompanied pair with xj1, as described above, to obtain the conditions, such as lines 9, 16, 27, 34 to determine the deleted positions.

Algorithm 2 Correct two consecutive deletion symbols
  • Input: 

    n,d,a,e,y=cjn2F4n2.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  •  1:

    cj+cj+1=ei=jn2yi mod 4.

  •  2:

    Get the binary sequence 0xjn2 and the run-length vector r of 0xjn2.

  •  3:

    Get the total number of elements of r as r.

  •  4:

    Δ = dRsyn(0xjn2) mod 2n.

  •  5:

    Set j=1.

  •  6:

    if  Δ2r then

  •  7:

       while jn1 do

  •  8:

         if mod(Δ,2)=1 then

  •  9:

            if 2k0xjn2(j1)+2(nj)+1=Δ then

  •  10:

              xj=x¯j1; xj+1=xj1

  •  11:

              c = del_correct2(n,a,y,j,xj,xj+1,cj+cj+1)

  •  12:

            else

  •  13:

              jj+1

  •  14:

            end if

  •  15:

         else

  •  16:

            if 2k0xjn2(j1)+2(nj)=Δ then

  •  17:

              xj=x¯j1; xj+1=x¯j1

  •  18:

              c = del_correct2(n,a,y,j,xj,xj+1,cj+cj+1)

  •  19:

            else

  •  20:

              jj+1

  •  21:

            end if

  •  22:

         end if

  •  23:

       end while

  •  24:

    else

  •  25:

       while jn1 do

  •  26:

         if mod(Δ,2)=1 then

  •  27:

            if 2k0xjn2(j1)+1=Δ then

  •  28:

              xj=xj1; xj+1=x¯j1

  •  29:

              c = del_correct2(n,a,y,j,xj,xj+1,cj+cj+1)

  •  30:

            else

  •  31:

              jj+1

  •  32:

            end if

  •  33:

         else

  •  34:

            if 2k0xjn2(j1)=Δ then

  •  35:

              xj=xj1; xj+1=xj1

  •  36:

              c = del_correct2(n,a,y,j,xj,xj+1,cj+cj+1)

  •  37:

            else

  •  38:

              jj+1

  •  39:

            end if

  •  40:

         end if

  •  41:

       end while

  •  42:

    end if

To clarify the explanation of the function del_correct2 for Algorithm 3 in Section 3.3, we provide the detail in Function 2. In Function 2, Syn_new implies the syndrome of the quaternary sequence after inserting the lost symbols cj and cj+1 in the j-th and (j+1)-th position of cjn2. If the value of Syn_new equals to the parameter of syndrome a, we infer that the quaternary sequence is retrieved successful.

  • Function 2: 

    c = del_correct2 (n, a, y, j, cj + cj+1)

  •  Input: 

    n, a, y, j, xj, xj+1, cj + cj+1.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  • 1:

    Using mapping (10) to obtain cj, then cj+1 = cj + cj+1cj

  • 2:

    Syn_new = i=1j1icj,in2+jci+(j+1)cj+1+i=j+2n2icj,in2 mod (8n + 1)

  • 3:

    ifSyn_new = a then

  • 4:

      c = c1,c2,…,cj−1,cj,cj + 1,…,cn

  • 5:

    else

  • 6:

            jj+1

  • 7:

    end if

Example 2: Let n,d,a, and e be 10, 0, 0, and 0, respectively. It is assumed that two consecutive deletions occur at the seventh and eighth position of the codeword c = (0,3,0,0,0,1,1,3,2,2)F410 and the received the quaternary sequence is y = cj8 = (0,3,0,0,0,1,2,2). As mentioned in Algorithm 3, the sum of the values of the two deleted symbols is cj+cj+1 = ei=1n2yi mod 4 = ei=1n2cj,i8 mod 4 = 0.

From the mapping (10) for cj8, the binary sequence 0xj8 and r are 0xj8 = 001000011 and r = (2,1,4,2), respectively. Then, r = 4 and the run-syndrome Rsyn(0xj8) = 15. The difference of run-syndrome is calculated as Δ = 0Rsyn(0xj8) mod 20 = 5.

From Algorithm 2, since Δ<2r and Δ mod 2=1, for 1jn1, the value j=7 satisfies the equation Δ = 2k0xj8(j1)+1 = 2k0xj8(6)+1 = 5. Thus, as mentioned in line 28 of Algorithm 2, we obtain x7=0 and x8=1, and the corrected binary sequence is 0100000111.

Applying mapping (10) to the binary sequence 01000001_11 and cj+cj+1=0, the two deleted symbols (c7,c8) are determined as (1,3). The syndrome Syn_new of the quaternary sequence when inserting (c7,c8)= (1,3) into cj8 is 0, which equals the syndrome a of codeword c. Thus, finally the recovered quaternary sequence is (0,3,0,0,0,1,1_,3_,2,2).

Algorithms 1 and 2 are proved using an exhaustive search strategy to show that the proposed code can correct at most two consecutive deletion symbols. However, as mentioned in [9], deletion-correcting codes are not always successful in identifying the exact location of the deleted symbols. For example, if an all-zero codeword is sent and one deletion error occurs, to find value of the deleted symbol is easy but it is impossible to find the exact position of the deleted symbol. Even though the exact position cannot be detected, the codeword can be successfully recovered by inserting a zero symbol in any position. This means that when the exact index of the deleted error is not detected but the run index which the deleted error belongs to is determined, the codeword can be successfully recovered by inserting one symbol in any position in the run.

If a codeword with a large run was sent and one deletion occurs in the large run, the proposed algorithm can always determine the value and the run index of the deleted symbol but rarely find the exact position of the deleted symbol in the run. In this case, we prioritize the proposed algorithm to output the first index in the detected run. Therefore, when a deletion error occurs in a large run and it is not possible to find the exact position in a codeword, the codeword of the proposed code will be successfully decoded by inserting the deleted symbol in the first index of the run.

3.4. Decoding Procedure for Insertion Errors

Since there is a similarity to the case of deletion errors, in this subsection, the correction capability of this proposed code for insertion errors is briefly presented. The received quaternary sequence y has a length that is one or two symbols larger than n, if one or two consecutive insertion errors occur. Table 4 summarizes the different computations of decoding between insertion and deletion errors.

Table 4.

The differences between insertion and deletion errors.

Content    One Insertion Error    Two Consecutive
Insertion Errors
    One Deletion Error    Two Consecutive
Deletion Errors
Length of the
received sequence
n+1 n+2 n1 n2
Sum of error
symbol(s)
i=1n+1yie mod 4 i=1n+2yie mod 4 ei=1n1yi mod 4 ei=1n2yi mod 4
Difference of
run-syndrome
(mod 2n)
Rsyn(0xjn+1)d Rsyn(0xjn+2)d dRsyn(0xjn1) dRsyn(0xjn2)

3.4.1. Correcting one Insertion Error

It is assumed that the received sequence with length n+1 is y = cjn+1 = (c1,c2,,cj1,h1,cj,cj+1,,cn)F4n+1, this means that one symbol h1 is inserted at the j-th position of the codeword cC(n,4). The process to correct the received sequence y can be briefly presented by the following steps.

The first step is calculating the value of the inserted symbol h1 in y. The received sequence y has a sum of total symbols computed as i=1n+1yi = i=1n+1cj,in+1 = i=1j1ci+h1+i=jnci, and then i=1n+1cj,in+1 mod 4 = i=1nci+h1 mod 4 = e+h1 mod 4. The value of the inserted symbol h1 is calculated by h1 = i=1n+1cj,in+1e mod 4.

The second step is determining the insertion position j. From mapping (10), we obtain the binary sequence 0xjn+1. From the binary sequence 0xjn+1, we obtain the run-length vector of 0xjn+1 and then calculate the difference of the run-syndrome by Δ = Rsyn(0xjn+1)d mod 2n. To determine the position j of the inserted symbol h1, in Appendix A, we provide Algorithm A1 and Function 3 for this step and the output is the corrected quaternary sequence.

3.4.2. Correcting Two Consecutive Insertion Errors

If two consecutive insertion errors occur at the j-th and (j+1)-th positions of the codeword cC(n,4), the received sequence is y = cjn+2= (c1,c2,,cj1,h1,h2,cj,cj+1,,cn)F4n+2 with length n+2. From the received sequence cjn+2 and a similar analysis as the one insertion case, the sum of the two inserted symbols is obtained as h1+h2 = i=1n+2cj,in+2e mod 4 = i=1n+2yie mod 4.

From the mapping (10) in Definition 4, since two consecutive symbols h1,h2 are inserted in cjn+2 corresponding two consecutive bits are also inserted in the binary sequence xjn+2, we can obtain the binary sequence 0xjn+2. Thus, the run-syndrome of 0xjn+2 is calculated by Δ = Rsyn(0xjn+2)d mod 2n. Algorithm A2 and Function 4 in Appendix B are provided to determine exact values of h1,h2 and the positions j and j+1 of the two consecutive insertion errors. Finally, h1 and h2 are removed from the sequence cjn+2 to retrieve the codeword c.

3.5. Cardinality of the Proposed Code

Since our main contribution is the correction code capability of the proposed code, then the lower bounds and upper bound of this code design is evaluated based on the previous methods in [7,15].

3.5.1. Lower Bound of the Code Cardinality

In [15] of Section IV, the lower bound of the code cardinality was determined by the potential values of the syndrome and checksum in the code construction. Hence, with the similar approach, by applying d[0,2n1],a[0,8n] and e[0,3], we can obtain the lower bound for the cardinality m(n,4) of the proposed code as

m(n,4)4n8n(8n+1). (14)

The redundancy of the proposed code can be at most as below

nlog4|C(n,4)|nlog44n8n(8n+1)2log48n. (15)

3.5.2. Upper Bound of the Code Cardinality

Define |M(n,4)| as the cardinality of the quaternary code of length n, with a maximum possible number of codewords, which can correct at most two consecutive deletion or insertion errors. Similar to the method in [7], the upper bound of the cardinality of |M(n,4)| is evaluated as

|M(n,4)||M1(n,4)|+|M2(n,4)|. (16)

where |M1(n,4)| is the number of codewords with length n such that the number of runs is larger than (r+1) (with r is an arbitrary number) and |M2(n,4)| is number of codewords with length n such that the number of runs is not larger than (r+1). The Equation (16) will be represented as

|M(n,4)|4n22(r+1)+1+2.4nn22j=0rn22j. (17)

Let we set r=3n222n22lnn224, and let n tends to infinity, then 2(r+1)+132n. Therefore, with r3n224, the upper bound of the cardinality of the proposed code can be written as

|M(n,4)|2·4n23n. (18)

4. Discussion

In this section, we explain the results of our proposed code design and then discuss about the applications of the proposed code.

We provide a new design of quaternary codes to correct at most two consecutive deletion or insertion errors. From Algorithms 1 and 2 and Appendixes Appendix A and Appendix B the correction capabilities of this design with deletion and insertion errors are proved. Obviously, with this proposed code, we can consider 0A, 1C, 2T, and 3G to directly construct or sequencing the DNA strands.

To deal with a burst of size of at most b (for any fixed b3) deletion or insertion errors, the intersection between the proposed code and the quaternary code can correct exactly b2 consecutive deletion or insertion errors.

For example, to correct a burst error of a size of at most b=3, first, we create the code C(n,4) from Definition 4, which takes care of one or two consecutive deletion or insertion errors. Then, we create a q-ary VT code VTa,e(n,4) from Definition 2 with q=4 to correct a single error. Then, by intersecting C(n,4) and VTa,e(n,4) we can obtain the expected quaternary code with length n, to deal with a burst error of size at most three. With given b>3, we use the array code construction which is described in Section 2.2 to create a quaternary code that can correct exactly b2 consecutive deletion or insertion errors. Through intersection of this code and our proposed code, a quaternary code that can correct at most b>3 consecutive deletion or insertion errors can be obtained.

5. Conclusions

In this paper, we propose a new design of a quaternary code to correct at most two consecutive deletion or insertion errors with redundancy at most 2log48n symbols. We also develop decoding algorithms for correcting one and two consecutive deletion or insertion errors in any quaternary sequences. Even though the results in this work provide significant applications for DNA storage and correction of multiple quaternary errors, there are still several open problems, such as code constructions which can correct at most b non-consecutive deletion or insertion errors and codes that can correct at most b deletion or insertion and substitution errors, for arbitrary b. Moreover, the optimal design when concatenation of constrained code and our proposed code for DNA-based data storage also needs to be considered.

Abbreviations

   The following abbreviations are used in this manuscript:

DNA Deoxyribonucleic acid
VT Varshamov–Tenengolts
SVT shifted-Varshamov–Tenengolts
GC-content Guanine-Cytosine content
MILP Mixed Integer Linear Programming

Appendix A. One Insertion Error Correction

As mentioned in Section 3.4, Algorithm A1 and the function ins1_correct1 which is used in Algorithm A1 are given as follows.

Algorithm A1 Correct one insertion symbol
  • Input: 

    n,a,y,0xjn+1.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  •  1:

    Calculate Δ and h1 as in Table 3.

  •  2:

    Set j=1.

  •  3:

    while  jn+1 do

  •  4:

       if Δ<r then

  •  5:

         if mod(Δ,2)=1 then

  •  6:

            if k0xjn+1(j1)+1=Δ then

  •  7:

              c=ins1_correct1(n,a,y,j,h1)

  •  8:

            else

  •  9:

              jj+1

  •  10:

            end if

  •  11:

         else

  •  12:

            if k0xjn+1(j1)=Δ then

  •  13:

              c=ins1_correct1(n,a,y,j,h1)

  •  14:

            else

  •  15:

              jj+1

  •  16:

            end if

  •  17:

         end if

  •  18:

       else

  •  19:

         if k0xjn+1(j1)+2(nj+1)+1=Δ then

  •  20:

            c=ins1_correct1(n,a,y,j,h1)

  •  21:

         else

  •  22:

            jj+1

  •  23:

         end if

  •  24:

       end if

  •  25:

    end while

  • Function 3: 

    c = ins1_correct1 (n, a, y, j, h1)

  • Input: 

    n, a, y, j, h1.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  • 1:

    ifcjn+1 (j) = h1 then

  • 2:

      Syn_new = i=1j+1icj,in1+i=j+1n+1(i1)cj,in+1 mod (8n + 1)

  • 3:

    else

  • 4:

            jj+1

  • 5:

    end if

  • 6:

    ifSyn_new = a then

  • 7:

      c = c1,c2,…,cj−1,cj,cj + 1,…,cn

  • 8:

    else

  • 9:

            jj+1

  • 10:

     end if

Algorithm A1 finds the possible position j of the inserted symbol as steps 6, 13, 19 then uses Function 3 which presents function ins1_correct1 to check this value of j to satisfy the constraint (12).

To determine the position of the inserted symbol, the value of Syn_new in line 2 of Function 3 indicates the syndrome of the received quaternary sequence cjn+1 when not considering the inserted symbol cjn+1(j). Thus, in the second term of the right-hand side, the coefficient needs to be (i1). This syndrome is compared to the constraint (12) to obtain the position of the inserted symbols and finally, remove the inserted symbol in the j-th position in cjn+1. Therefore, the quaternary sequence satisfies constraints (11), (12), and (13) can correct any one insertion error.

Appendix B. Two Consecutive Insertion Errors Correction

To correct the quaternary sequence when two consecutive insertion errors occur, we provide the details of correction procedure in Algorithm A2 and Function 4.

Algorithm A2 is constructed based on the analysis which is mentioned in Section 3.5. From steps 6, 13, 24, 31, the possible positions j and j+1 of two consecutive insertion errors in the related binary sequence x can be obtained. However, since mapping (10) is used to map from quaternary symbols to binary bits, there can exist different cases of quaternary symbols which are mapped to the same binary bits, so we need to verify the exact quaternary values corresponding to the j-th and (j+1)-th positions.

  • Function 4: 

    c = ins2_correct2 (j, n, a, y, h1 + h2)

  • Input: 

    j, n, a, y, h1 + h2.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  • 1:

    ifcjn+2 (j) = h1 and  cjn+2 (j + 1) = h1 + h2h1 then

  • 2:

      Syn_new = i=1j1icj,in+2+i=j+2n+2(i2)cj,in+2 mod (8n + 1)

  • 3:

    else

  • 4:

            jj+1

  • 5:

    end if

  • 6:

    ifSyn_new = a then

  • 7:

      c = c1,c2,…,cj−1,cj,cj + 1,…,cn

  • 8:

    else

  • 9:

            jj+1

  • 10:

     end if

The function ins2_correct2 in Algorithm A2 is provided as Function 4 to output the unique sequence which satisfies three constraints (11), (12), and (13). Steps 1,2 in Function 4 correspond to the comparison to the constraints (13) and (12), respectively. This comparison determines the exact value and position of the inserted symbols as mentioned in Section 3.1.

In the similar way to Function 3 in Appendix A, the function Syn_new calculates syndrome of cjn+2 when not considering the symbols in the j-th and (j+1)-th positions. This leads to the coefficient in the second term of function Syn_new is (i2), meaning that the symbols which are after the (j+1)-th symbols are shifted to the left by 2 positions. The syndrome value Syn_new is compared to the value a of the constraint (12) to determine the exact values and positions of the inserted symbols of sequence. Obviously, the output sequence will satisfy both three constraints in Definition 4. Finally, two consecutive inserted symbols at the j-th and (j+1)-th positions of cjn+2 are removed. The output of Function 4 is the corrected quaternary sequence.

Algorithm A2: Correct two consecutive insertion symbols
  • Input: 

    n,a,y,0xjn+2.

  • Output: 

    c=(c1,c2,,cn)C(n,4).

  •  1:

    Calculate Δ and h1+h2 as in Table 3.

  •  2:

    Set j=1.

  •  3:

    if  Δ2r  then

  •  4:

       while jn+1 do

  •  5:

         if mod(Δ,2)=1 then

  •  6:

            if 2k0xjn+2(j1)+2(nj)+5=Δ then

  •  7:

              xj=x¯j1; xj+1=xj1

  •  8:

              c=ins2_correct2(j,n,a,y,h1+h2)

  •  9:

            else

  •  10:

              jj+1

  •  11:

            end if

  •  12:

         else

  •  13:

            if 2k0xjn+2(j1)+2(nj)+4=Δ then

  •  14:

              xj=x¯j1; xj+1=x¯j1

  •  15:

              c=ins2_correct2(j,n,a,y,h1+h2)

  •  16:

            else

  •  17:

              jj+1

  •  18:

            end if

  •  19:

         end if

  •  20:

       end while

  •  21:

    else

  •  22:

       while jn+1 do

  •  23:

         if mod(Δ,2)=1 then

  •  24:

            if 2k0xjn+2(j1)+1=Δ then

  •  25:

              xj=xj1; xj+1=x¯j1

  •  26:

              c=ins2_correct2(j,n,a,y,h1+h2)

  •  27:

            else

  •  28:

              jj+1

  •  29:

            end if

  •  30:

         else

  •  31:

            if 2k0xjn+2(j1)=Δ then

  •  32:

              xj=xj1; xj+1=xj1

  •  33:

              c=ins2_correct2(j,n,a,y,h1+h2)

  •  34:

            else

  •  35:

              jj+1

  •  36:

            end if

  •  37:

         end if

  •  38:

       end while

  •  39:

    end if

Author Contributions

All authors discussed the contents of the manuscript and contributed to its presentation. T.-H.K. designed and implemented the proposed code construction and algorithms, wrote the paper under the supervision of S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1802-09.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Goldman N., Bertone P., Chen S., Dessimoz C., LeProust E.M., Sipos B., Birney E. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature. 2013;494:77–80. doi: 10.1038/nature11875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Blawat M., Gaedke K., Hütter I., Chen X., Turczyk B., Inverso S., Pruitt B., Church G. Forward error correction for DNA data storage. Procedia Comput. Sci. 2016;80:1011–1022. doi: 10.1016/j.procs.2016.05.398. [DOI] [Google Scholar]
  • 3.Erlich Y., Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science. 2016;355:950–954. doi: 10.1126/science.aaj2038. [DOI] [PubMed] [Google Scholar]
  • 4.Heckel R., Mikutis G., Grass R. A characterization of the DNA data storage channel. Sci. Rep. 2019;9:1–12. doi: 10.1038/s41598-019-45832-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Varshamov R., Tenengolts G. A code that correctscorrects single asymmetric errors. Autom. Telemkhanika. 1965;26:288–292. [Google Scholar]
  • 6.Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 1966;10:707–710. [Google Scholar]
  • 7.Levenshtein V.I. Asymptotically optimum binary codes with correction for losses of one or two adjacent bits. Syst. Theo. Res. 1970;19:298–304. [Google Scholar]
  • 8.Cheng L., Swart T., Ferreira H., Abdel-Ghaffar K. Codes for correcting three or more consecutive deletions or insertions; Proceedings of the 2014 IEEE International Symposium on Information Theory; Honolulu, HI, USA. 29 June–4 July 2014; pp. 1246–1250. [Google Scholar]
  • 9.Schoeny C., Wachter-Zeh A., Gabrys R., Yaakobi E. Codes correcting a burst of deletions or insertions. IEEE Trans. Inf. Theory. 2017;63:1971–1985. doi: 10.1109/TIT.2017.2661747. [DOI] [Google Scholar]
  • 10.Chee Y., Kiah H., Nguyen T. Linear-time encoders for codes correcting a single edit for DNA-based data storage; Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT); Paris, France. 7–12 September 2019; pp. 773–776. [Google Scholar]
  • 11.Nguyen T., Cai K., Immink K., Kiah H. Capacity-approaching constrained codes with error correction for DNA-based data storage. IEEE Trans. Inf. Theory. 2021;67:5602–5613. doi: 10.1109/TIT.2021.3066430. [DOI] [Google Scholar]
  • 12.Bornholt J., Lopez R., Carmean D., Ceze L., Seelig G. A DNA-based archival storage system; Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems; Atlanta, GA, USA. 2–6 April 2016; pp. 637–649. [Google Scholar]
  • 13.Ross M., Russ C., Costello M., Hollinger A., Lennon N., Hegarty R., Nusbaum C., Jaffe D. Characterizing and measuring bias in sequence data. Genome Bio. 2013;14:R51. doi: 10.1186/gb-2013-14-5-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cai K., Chee Y., Gabrys R., Kiah H., Nguyen T. Correcting a single indel/edit for DNA-based data storage: Linear-time encoders and order-optimality. IEEE Trans. Inf. Theory. 2021;67:3438–3451. doi: 10.1109/TIT.2021.3049627. [DOI] [Google Scholar]
  • 15.Tenengolts G. Nonbinary codes, correcting single deletion or insertion. IEEE Trans. Inf. Theory. 1984;30:766–769. doi: 10.1109/TIT.1984.1056962. [DOI] [Google Scholar]
  • 16.Schoeny C., Sala F., Dolecek L. Novel combinatorial coding results for DNA sequencing and data storage; Proceedings of the 2017 51st Asilomar Conf. Signals, Systems, and Computers; Pacific Grove, CA, USA. 29 October–1 November 2017; pp. 511–515. [Google Scholar]
  • 17.Paluni F., Swart T., Weber J., Ferreira H., Clarke W. A note on non-binary multiple insertion/deletion correcting codes; Proceedings of the 2011 IEEE Information Theory Workshop; Paraty, Brazil. 16–20 October 2011; pp. 683–687. [Google Scholar]
  • 18.Sima J., Raviv N., Bruck J. Two deletion correcting codes from indicator vectors. IEEE Trans. Inf. Theory. 2020;66:2375–2391. doi: 10.1109/TIT.2019.2950290. [DOI] [Google Scholar]
  • 19.Sima J., Gabrys R., Bruck J. Optimal codes for the q-ary deletion channel; Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT); Los Angeles, CA, USA. 21–26 June 2020; pp. 740–745. [Google Scholar]
  • 20.Sima J., Gabrys R., Bruck J. Optimal systematic t-deletion correcting codes; Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT); Los Angeles, CA, USA. 21–26 June 2020; pp. 769–774. [Google Scholar]
  • 21.Sima J., Bruck J. On optimal k-deletion correcting codes. IEEE Trans. Inf. Theory. 2020;67:3360–3375. doi: 10.1109/TIT.2020.3028702. [DOI] [Google Scholar]
  • 22.Wang S., Sima J., Farnoud F. Non-binary codes for correcting a burst of at most 2 deletions; Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT); Melbourne, Australia. 12–20 July 2021; pp. 2804–2809. [Google Scholar]
  • 23.No A. Nonasymptotic upper bounds on binary single deletion codes via mixed integer linear programming. Entropy. 2019;21:1202. doi: 10.3390/e21121202. [DOI] [Google Scholar]
  • 24.Immink K., Cai K. Properties and constructions of constrained codes for DNA-based data storage. IEEE Access. 2020;8:49523–49531. doi: 10.1109/ACCESS.2020.2980036. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES