Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 23;12140:45–58. doi: 10.1007/978-3-030-50423-6_4

Grammatical Inference by Answer Set Programming

Wojciech Wieczorek 15,, Łukasz Strąk 15, Arkadiusz Nowakowski 15, Olgierd Unold 16
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7303697

Abstract

In this paper, the identification of context-free grammars based on the presentation of samples is investigated. The main idea of solving this problem proposed in the literature is reformulated in two different ways: in terms of general constrains and as an answer set program. In a series of experiments, we showed that our answer set programming approach is much faster than our alternative method and the original SAT encoding method. Similarly to a pioneer work, some well-known context-free grammars have been induced correctly, and we also followed its test procedure with randomly generated grammars, making it clear that using our answer set programs increases computational efficiency. The research can be regarded as another evidence that solutions based on the stable model (answer set) semantics of logic programming may be a right choice for complex problems.

Keywords: Grammatical inference, Answer set programming, Constraint satisfaction problem

Introduction

In grammatical inference [9], a learning algorithm La takes a finite sequence (usually strings) of examples as input and outputs a language description (usually grammars). There are two main types of presentations: (i) A text for a language L is an infinite sequence of strings Inline graphic from L such that every string of L occurs at least once in the text; (ii) An informant for a language L is an infinite sequence of pairs Inline graphic in Inline graphic such that every string of Inline graphic occurs at least once in the sequence and Inline graphic. The inference algorithms that use type (ii) of information are said to learn from positive and negative examples. From the Gold’s results [7], we know that the class of context-free languages (and even regular languages) cannot be identified from presentation (i), but can be identified using presentation (ii). However, de la Higuera [8] showed that it is computationally hard.

In this work, the following informant learning environment is exploited. Suppose that the inferring process is based on the existence of an Oracle, which can be seen as a device that:

  1. Knows the language and has to answer correctly.

  2. Can answer equivalence queries. They are made by proposing some hypothesis to the Oracle. The hypothesis is a grammar representing the unknown language. The Oracle answers Yes in the positive case. In the negative case, the Oracle has to return the shortest string in the symmetric difference between the target language and the submitted hypothesis.

Then the following procedure can be applied. Start from a small1 sample S and Inline graphic. The parameter k denotes the number of non-terminal symbols in the target grammar. Run an answer set program (or another exact method). Every time it turns out that there is no solution that satisfies all of the constraints, increase k by 1. As long as the Oracle returns a pair (xd) in response to an equivalent query, add (xd) to S and run the answer set program again (or respectively another exact method). Stop after the answer is Yes. Unfortunately, there is no guarantee that the procedure will terminate in a polynomial number of steps, even when the target language is regular [1]. The equivalence checking may be done by random sampling. The positive answer could be incorrect, but this probability decreases if the sampling is repeated.

A very similar procedure for the induction of context-free grammars was proposed by Imada and Nakamura [11]. However, for the exact searching of k-variable grammar, they used Boolean formulas and applied an SAT solver. We took over their main Boolean variables, treating them as predicates, and then constructed a new encoding founded on answer set programming. In an alternative approach, we used general constraints of Gurobi Optimizer2 instead of ASP.

Related Work

The most closely related work to CFG identification is by Imada and Nakamura [11]. They proposed a way to synthesize CFGs from positive and negative samples based on solving a Boolean satisfiability problem (SAT). They translated the learning problem for a CFG into a SAT, which is then solved by a SAT solver. The result of the SAT solver satisfying the SAT contains a minimal set of rules (it can be easily changed to a minimal set of variables) that derives all positive samples and no negative samples.

They used one derivation constraint and two main types of Boolean variables:

  • Derivation variables. A set of derivation variables represents a relation between nonterminal symbols and substrings (in other words, derivation or parse tree) of each (positive or negative) sample w as follows: for any substring x of w and Inline graphic, the derivation variable Inline graphic represents that the nonterminal p derives the string x.

  • Rule variables. A set of rule variables represents a rule set as follows: for any Inline graphic, Inline graphic, a variable Inline graphic (or Inline graphic) determines whether the production rule Inline graphic (or Inline graphic) is a member of the set of rules or not.

The derivation constraint is a set of following Boolean expressions for any string Inline graphic (Inline graphic) and nonterminal Inline graphic.

graphic file with name M18.gif

Nakamura et al. have been working on another approach for incremental learning of CFGs implemented in the Synapse system [15]. This approach is based on rule generation by analyzing the results of bottom-up parsing for positive samples and searching for rule sets. Their system can also learn similar CFGs but does it only from positive samples. Both methods synthesized similar rule sets for each language in their experiments. They reported that the computation time by the SAT-based approach is rather shorter than Synapse in most languages.

Our Contribution

The purpose of the present proposal is to investigate to what extent the power of an ASP solver makes it possible to tackle the context-free inference problem for large-size instances and to compare our approach with the original one. Because of the possibility of future comparisons with other methods, the Python implementation3 of our winning method is given via GitLab.

The main original scientific contributions are as follows:

  • the formulation of the induction of a k-variable context-free grammar in terms of logical rules with answer set semantics;

  • the formulation of the induction of a k-variable context-free grammar in terms of general constraints;

  • the construction of an informant learning algorithm based on ASP, CSP, and SAT solvers;

  • the conduct of an appropriate statistical test in order to determine the fastest CFG inference method.

This paper is organized into five sections. In Sect. 2, we present necessary definitions and facts originating from formal languages and declarative problem-solving. Section 3 describes our inference algorithms: (a) based on solving an answer set program, and (b) based on solving a constraint satisfaction program, including general constraints such as AND/OR. Section 4 shows the experimental results of our approaches in comparison with the original one. Concluding comments are made in Sect. 5.

Preliminaries

We assume the reader to be familiar with basic context-free languages theory, e.g., from [10], so that we introduce only some notations and notions used later in the paper.

Words and Languages

An alphabet is a finite, non-empty set of symbols. We use the symbol Inline graphic for the alphabet. A word is a finite sequence of symbols chosen from the alphabet. We denote the length of the word w by |w|. The empty word Inline graphic is the word with zero occurrences of symbols. Let x and y be words. Then xy denotes the catenation of x and y, that is, the word formed by making a copy of x and following it by a copy of y. As usual, Inline graphic denotes the set of words over Inline graphic. The word w is called a prefix of the word u if there is a word x such that Inline graphic. We call it a proper prefix if Inline graphic. The word w is called a suffix of the word u if there is a word x such that Inline graphic. It is a proper suffix if Inline graphic. A factor (or subword) is a prefix of a suffix. A set of words, all of which are chosen from some Inline graphic, where Inline graphic is a particular alphabet, is called a language.

Context-Free Grammars

context-free grammar (CFG) is defined by a quadruple Inline graphic, where V is an alphabet of variables (or sometimes non-terminal symbols), Inline graphic is an alphabet of terminal symbols such that Inline graphic, P is a finite set of production rules in the form Inline graphic for Inline graphic and Inline graphic, and Inline graphic is a special non-terminal symbol called the start symbol. For simplicity’s sake, we write Inline graphic instead of Inline graphic. We call the word Inline graphicsentential form. Let u, v be two words in Inline graphic and Inline graphic. Then, we write Inline graphic, if Inline graphic is a rule in P. That is, we can substitute the word x for symbol A in a sentential form if Inline graphic is a rule in P. We call this rewriting a derivation. For any two sentential forms x and y, we write Inline graphic, if there exists a sequence Inline graphic of sentential forms such that Inline graphic for all Inline graphic. The language L(G) generated by G is the set of all words over Inline graphic that are generated by G; that is, Inline graphic. A language is called a context-free language if it is generated by a context-free grammar. Assume that G is the unknown (target) CFG to be identified. An example (a positive word) of G is a word in L(G), and a counter-example (a negative word) of G is a word not in L(G).

normal form for context-free grammars is a form, for which any grammar can be converted to the respective normal form version. Amongst all normal forms for context-free grammars, the most useful and the most well-known one is the Chomsky normal form (CNF). A grammar is said to be in Chomsky normal form if each of its rules is in one of two possible forms:

  1. Inline graphic, or

  2. Inline graphic.

Answer Set Programming

We will briefly introduce the idea of answer set programming (ASP). Those who are interested in a more detailed description of the topic, alternative definitions, and the formal specification of this kind of logic programming are referred to handbooks [3, 6], and [12].

A variable or constant is a term. An atom is Inline graphic, where a is a predicate of arity n and Inline graphic are terms. A literal is either a positive literal p or a negative literal Inline graphic, where p is an atom.

rule r is a clause of the form

graphic file with name M55.gif 1

where Inline graphic are atoms. The atom Inline graphic is the head or r, while the conjunction Inline graphic is the body of r. By Inline graphic, we denote the head atom, and by Inline graphic the set Inline graphic of the body literals. Inline graphic (Inline graphic, resp.) denotes the set of atoms occurring positively (negatively, resp.) in Inline graphic. A program (also called ASP program) is a finite set of rules. A Inline graphic-free program is called positive. A term, atom, literal, rule, or a program is ground if no variables appear in it.

Let Inline graphic be a program. Let r be a rule in Inline graphic, a ground instance of r is a rule obtained from r by replacing4 every variable X in r by constants occurring in Inline graphic. We denote the set of all the ground instances of the rules occurring in Inline graphic by Inline graphic.

An interpretation I for Inline graphic is a set of ground atoms. A ground positive literal A is true (false, resp.) w.r.t. I if Inline graphic (Inline graphic, resp.). A ground negative literal Inline graphic is true (false, resp.) w.r.t. I if Inline graphic (Inline graphic, resp.).

Let r be a ground rule in Inline graphic. The head of r is true w.r.t. I if Inline graphic. The body of r is true w.r.t. I if all body literals of r are true w.r.t. I (i.e., Inline graphic) and is false w.r.t. I otherwise. The rule r is satisfied (or true) w.r.t. I if r head is true w.r.t. I or r body is false w.r.t. I.

model for Inline graphic is an interpretation M for Inline graphic such that every rule Inline graphic is true w.r.t. M.

Given a program Inline graphic and an interpretation I, the reduct Inline graphic is the set of positive rules defined as follows:

graphic file with name M85.gif 2

I is an answer set of Inline graphic if I is the Inline graphic-smallest model for Inline graphic.

Over the last years, answer set programming has emerged as a declarative problem-solving paradigm. It is a programming methodology rooted in research on artificial intelligence and computational logic, and researchers use it in many areas of science and technology. For experiments we took advantages of clingo—one of the most efficient and widely used answer set programming system available5 today. In addition to standard definitions, clingo allows to define constraints, i.e., rules with the empty head, for instance

graphic file with name M89.gif 3

By adding this constraint to a program, we eliminate its answer sets that contain Inline graphic. Adding the ‘opposite’ constraint

graphic file with name M91.gif 4

eliminates those answers that do not contain Inline graphic. A constraint can be translated into a normal rule. To this end, the constraint

graphic file with name M93.gif 5

is mapped onto the rule

graphic file with name M94.gif 6

where x is a new atom.

Example. Suppose we have three numbered urns and two distinguishable balls. Every ball has been put to an urn, maybe to the same. An ASP program to code this knowledge is as follows:

graphic file with name M95.gif 7
graphic file with name M96.gif 8
graphic file with name M97.gif 9
graphic file with name M98.gif 10
graphic file with name M99.gif 11
graphic file with name M100.gif 12
graphic file with name M101.gif 13
graphic file with name M102.gif 14
graphic file with name M103.gif 15

Please notice that as usual in logic programming, identifiers with initial uppercase letters are assigned to variables. Rules 711 are simple facts concerning urns and balls. Rules 12 and 13 define predicates that tell whether a ball is inside in a particular urn. Inequality Inline graphic is only used during grounding to eliminate some ground instances of rule 13. It is worth mentioning that grounding systems do not make unnecessary replacements, for example, 1 for U. Rules 14 and 15 ensure that every ball is exactly in one urn.

Suppose now that we have discovered that urn 2 is empty and we want to know possible configurations. It is enough to add two facts:

graphic file with name M105.gif 16
graphic file with name M106.gif 17

and find all answer sets. A possible answer set is: ball(q), ball(r), urn(1), urn(2), in(r), not_in(2, q), not_in(2, r), not_in(3, q), not_in(3, r), contains(1, q), in(q), urn(3), contains(1, r), which describes the placement of both balls into the first urn.

clingo also allows using choice constructions, for instance:

graphic file with name M107.gif 18

describes all possible ways to choose which two of the atoms p(1, q), p(2, q), p(3, q) and which two of the atoms p(1, r), p(2, r), p(3, r) are included in the resultant model. Before and after an expression in braces, we can put integers, which express bounds on the cardinality of the stable models described by the rule. The number on the left is the lower bound (0 is default), and the number on the right is the upper bound (unbounded is default).

Proposed Encodings for the Induction of CFGs

Our translation converts CFG identification into an ASP program (the main approach) and CSP model (an alternative approach, constraint satisfaction problem). Suppose we are given a sample composed of examples, Inline graphic, and counter-examples, Inline graphic, over an alphabet Inline graphic, and a positive integer k. We want to find a k-variable CFG Inline graphic such that Inline graphic and Inline graphic.

Using Logic Programming with Answer Set Semantics

Let F be the set of all factors (excluding the empty word) of Inline graphic. Let us now see how to describe the rules for the relationship between a grammar G and a sample Inline graphic in terms of ASP. There are three main predicates: y(IJL), which indicates the presence of Inline graphic in P; w(IQ), which indicates that Inline graphic, where Q represents a factor; and z(IA), which indicates the presence of Inline graphic.

  1. We have the following domain specification, our facts.
    graphic file with name M119.gif 19
    graphic file with name M120.gif 20
    graphic file with name M121.gif 21
    graphic file with name M122.gif 22
    graphic file with name M123.gif 23
    graphic file with name M124.gif 24
  2. The next rules ensure that in a grammar G a factor can or cannot be derived from a specific variable and ensure that in the grammar there is a subset of all possible productions.
    graphic file with name M125.gif 25
    graphic file with name M126.gif 26
    graphic file with name M127.gif 27
    graphic file with name M128.gif 28
    graphic file with name M129.gif 29
  3. All examples should be accepted, and no counter-example can be accepted.
    graphic file with name M130.gif 30
    graphic file with name M131.gif 31
  4. For every Inline graphic for which Inline graphic and for every pair Inline graphic (Inline graphic) of such factors that Inline graphic, f can be derived from a non-terminal I if there are two non-terminals, J and L, such that b can be derived from J, c can be derived from L, and there is a production Inline graphic.
    graphic file with name M138.gif 32
  5. On the other hand, if Inline graphic, then at least one such pair Inline graphic should exist, that Inline graphic is in P and Inline graphic and Inline graphic.
    graphic file with name M144.gif 33

Using General Constraints

This time, instead of predicates, w, y, and z are binary variables. We use the following constraints

graphic file with name M145.gif 34
graphic file with name M146.gif 35

and

graphic file with name M147.gif 36

for each Inline graphic, where Inline graphic means if Inline graphic then Inline graphic and if Inline graphic then Inline graphic, and Inline graphic.

Experimental Results

In this section, we describe some experiments comparing the performance of our approaches implemented6 in Python, using clingo (ASP) and using Gurobi Optimizer, with our implementation of Imada et al. algorithm [11] using the PicoSAT solver (SAT), when positive and negative words are given. For these experiments, we use a set of 40 samples: partly based on randomly generated grammars (33 samples) and partly based on the set of fundamental CFGs appearing in grammatical inference research (the last 7 samples).

Benchmarks

For testing the learning power for general CFGs, we randomly generated 33 CFGs and prepared positive and negative samples with lengths no longer than 14 exhaustively enumerated for them. The grammars are in Chomsky normal form with 6 to 12 rules on the alphabet Inline graphic. In every sample, positive words constitute not less than 20% of the total.

The last seven samples are also with lengths no longer than 14 exhaustively enumerated, but they were generated based on the following descriptions:

  1. The set of palindromes over Inline graphic.

  2. The parentheses language: the set of strings consisting of equal numbers of a’s and b’s such that every prefix does not have more b’s than a’s.

  3. The set of strings consisting of b’s twice as many as a’s.

  4. The set of strings of a’s and b’s not of the form ww.

  5. The complement of the language (b).

  6. Inline graphic.

  7. The set of strings consisting of equal numbers of a’s and b’s.

Performance Comparison

In all experiments, we used Intel Xeon CPU E5-2650 v2, 2.6 GHz (single-core out of eight), under Ubuntu 18.04 operating system with 60 GB available RAM. Algorithm 1 shows the process for synthesizing a grammar (the set of production rules with Inline graphic being always the start symbol) from positive and negative words. graphic file with name 500804_1_En_4_Figa_HTML.jpg In the algorithm, Inline graphic and Inline graphic represent the set of positive and negative words as an input. The variables Inline graphic and Inline graphic hold sets of samples to be covered in the next loop iteration. The algorithm picks up a word from Inline graphic or Inline graphic that is not covered by the inferred grammar G, and add it to Inline graphic or Inline graphic. The function Convert translates the problem into a set of ASP rules R (or Gurobi general constraints or a Boolean expression). If the ASP solver (or Gurobi Optimizer or the SAT solver) finds a stable model M, the function Extract returns a set of production rules by analyzing the presence of particular y(ijl) and z(ia) atoms. The algorithm repeats this process—increasing k to relaxe the limit on the number of non-terminals—until G covers the all given Inline graphic and Inline graphic.

The results are listed in Table 1. In order to determine whether the observed CPU time differences between ASP’s runs and the remaining methods’ runs did not occur by chance, we use the Wilcoxon signed-rank test [17, pp. 915–916] for ASP vs SAT and ASP vs Gurobi. The null hypothesis to be tested is that the median of the paired differences is negative (against the alternative that it is positive). As we can see from Table 2, p-value is high in both cases, so the null hypothesis cannot be rejected, and we may conclude that using our ASP encoding is likely to improve CPU time performance for most of this kind of benchmarks.

Table 1.

Execution times of exact solving CFG identification in seconds

Language |V| ASP SAT Gurobi
1 3 51.70 48.65 56.42
2 6 646.39 21049.22 Inline graphic21050
3 4 74.31 189.85 143.76
4 5 75.90 347.84 Inline graphic2000
5 4 27.91 64.82 18.36
6 5 75.96 335.98 10.33
7 4 68.35 61.87 Inline graphic2000
8 4 57.14 118.25 28.85
9 3 45.17 94.86 73.03
10 5 211.33 568.12 568.06
11 5 62.48 166.65 Inline graphic2000
12 3 21.50 58.12 33.28
13 6 112.69 705.80 Inline graphic2000
14 6 943.02 4807.32 Inline graphic4808
15 7 19358.09 252290.70 Inline graphic252291
16 4 49.01 111.22 103.05
17 7 2921.44 8035.44 Inline graphic8036
18 5 361.52 1369.22 Inline graphic2000
19 5 63.47 238.71 186.10
20 2 12.96 5.64 3.88
21 5 96.68 512.83 671.62
22 2 11.38 12.02 10.54
23 3 11.84 43.03 9.92
24 4 109.98 159.73 176.49
25 3 22.65 22.40 29.65
26 5 38.74 271.30 420.11
27 5 94.76 295.81 Inline graphic2000
28 5 216.61 625.07 Inline graphic2000
29 5 271.88 324.43 Inline graphic2000
30 6 228.98 412.16 Inline graphic2000
31 2 10.97 15.29 19.84
32 5 62.17 293.98 105.74
33 3 10.42 18.30 13.15
34 5 31.13 49.28 32.83
35 3 12.84 20.97 12.86
36 4 118.17 76.98 73.74
37 6 173.66 191.42 Inline graphic2000
38 4 29.33 54.63 36.71
39 4 4.12 21.00 9.02
40 3 66.71 50.65 40.40

Table 2.

Obtained p-values from the Wilcoxon signed-rank test

ASP vs SAT ASP vs Gurobi
0.999999647 0.999987068

ASP-Based CFG Induction on Bioinformatics Datasets

Our induction method can also be applied to other data, that are not taken from context-free infinite languages. We tried its classification quality on two bioinformatics datasets: WALTZ-DB database [4], composed by 116 hexapeptides known to induce amyloidosis (Inline graphic) and by 161 hexapeptides that do not induce amyloidosis (Inline graphic) and Maurer-Stroh et al. database from the same domain [14], where the ratio of Inline graphic is 240/836.

We chose a few standard machine learning methods for comparison: BNB (Naive Bayes classifier for multivariate Bernoulli models [13, pp. 234–265]), DTC (Decision Trees Classifier, CART method [5]), MLP (Multi-layer Perceptron [16]), and SVM (Support Vector Machine classifier with the linear kernel [18]). In all methods except ASP and BNB, an unsupervised data-driven distributed representation, called ProtVec [2], was applied in order to convert words (protein representations) to numerical vectors. For using BNB, we represented words as binary-valued feature vectors that indicated the presence or absence of every pair of protein letters. In case of ASP, the training set was partitioned randomly into n parts, and the following process was being performed m times. Choosing one part for synthesizing a CFG and use rest Inline graphic parts for validating it. The best of all m grammars—in terms of higher F-measure—was then confronted with the test set. For WALTZ-DB n and m have been set to 20, for Maurer-Stroh n has been set to 10 and m to 30. These values were selected experimentally based on the size of databases and the running time of the ASP solver.

To estimate the ASP’s and compared approaches’ ability to classify unseen hexapeptides repeated 10-fold cross-validation (cv) strategy was used. It means splitting the data randomly into 10 mutually exclusive folds, building a model on all but one fold, and evaluating the model on the skipped fold. The procedure was repeated 10 times and the overall assessment of the model was based on the mean of those 10 individual evaluations. Table 3 summarizes the performances of the compared methods on WALTZ-DB and Maurer-Stroh databases. It is noticable that the ASP approach achieved best F-score for smaller dataset (Maurer-Stroh) and an average F-score for the bigger one (WALTZ-DB), hence it can be used with a high reliability to recognize amyloid proteins. BNB is outstanding for the WALTZ-DB and almost as good as ASP for Maurer-Stroh database.

Table 3.

Performance of compared methods on WALTZ-DB and Maurer-Stroh databases in terms of Precision (P), Recall (R), and F-score (F1)

Method WALTZ-DB Maurer-Stroh
P R F1 P R F1
ASP Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
BNB Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
DTC Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
MLP Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
SVM Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

Conclusion

In this paper, we proposed an approach for learning context-free grammars from positive and negative samples by using logic programming. We encode the set of samples, together with limits on the number of non-terminals to be synthesized as an answer set program. A stable model (an answer set) for the program contains a set of grammar rules that derives all positive samples and no negative samples. A feature of this approach is that we can synthesize a compact set of rules in Chomsky normal form. The other feature is that our learning method reflects future improvements on ASP solvers. We present experimental results on learning CFGs for fundamental context-free languages, including a set of strings composed of the equal numbers of a’s and b’s and the set of strings over Inline graphic not of the form ww. Another series of experiments on random languages shows that our encoding can speed up computations in comparison with SAT and CSP encodings.

Footnotes

1

We are aware of this imprecision. The number of words and their lengths should allow of executing a program in a reasonable amount of time. In experiments, we took two words: one example and one counter-example.

3

The Python scripting language is used only for generating appropriate AnsProlog facts.

4

This process can be done efficiently, because many ground instances can be discarded; see Chapter 4 of [6].

This research was supported by National Science Center (Poland), grant number 2016/21/B/ST6/02158.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Wojciech Wieczorek, Email: wojciech.wieczorek@us.edu.pl.

Łukasz Strąk, Email: lukasz.strak@us.edu.pl.

Arkadiusz Nowakowski, Email: arkadiusz.nowakowski@us.edu.pl.

Olgierd Unold, Email: olgierd.unold@pwr.edu.pl.

References

  • 1.Angluin D. Negative results for equivalence queries. Mach. Learn. 1990;5(2):121–150. doi: 10.1007/BF00116034. [DOI] [Google Scholar]
  • 2.Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE. 2015;10(11):1–15. doi: 10.1371/journal.pone.0141287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Baral C. Knowledge Representation, Reasoning, and Declarative Problem Solving. New York: Cambridge University Press; 2003. [Google Scholar]
  • 4.Beerten J, et al. WALTZ-DB: a benchmark database of amyloidogenic hexapeptides. Bioinformatics. 2015;31(10):1698–1700. doi: 10.1093/bioinformatics/btv027. [DOI] [PubMed] [Google Scholar]
  • 5.Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Monterey: Wadsworth and Brooks; 1984. [Google Scholar]
  • 6.Gebser M, Kaminski R, Kaufmann B, Schaub T. Answer Set Solving in Practice. San Rafael: Morgan & Claypool Publishers; 2012. [Google Scholar]
  • 7.Gold EM. Language identification in the limit. Inf. Control. 1967;10:447–474. doi: 10.1016/S0019-9958(67)91165-5. [DOI] [Google Scholar]
  • 8.de la Higuera C. Characteristic sets for polynomial grammatical inference. Mach. Learn. 1997;27(2):125–138. doi: 10.1023/A:1007353007695. [DOI] [Google Scholar]
  • 9.de la Higuera C. Grammatical Inference: Learning Automata and Grammars. New York: Cambridge University Press; 2010. [Google Scholar]
  • 10.Hopcroft JE, Motwani R, Ullman JD. Introduction to Automata Theory, Languages, and Computation. 2. Reading: Addison-Wesley; 2001. [Google Scholar]
  • 11.Imada, K., Nakamura, K.: Learning context free grammars by using SAT solvers. In: Proceedings of the 2009 International Conference on Machine Learning and Applications, pp. 267–272. IEEE Computer Society (2009)
  • 12.Lifschitz V. Answer Set Programming. Cham: Springer; 2019. [Google Scholar]
  • 13.Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press; 2008. [Google Scholar]
  • 14.Maurer-Stroh S, et al. Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat. Methods. 2010;7:237–242. doi: 10.1038/nmeth.1432. [DOI] [PubMed] [Google Scholar]
  • 15.Nakamura K, Matsumoto M. Incremental learning of context free grammars based on bottom-up parsing and search. Pattern Recognint. 2005;38(9):1384–1392. doi: 10.1016/j.patcog.2005.01.004. [DOI] [Google Scholar]
  • 16.Pedregosa F, et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830. [Google Scholar]
  • 17.Salkind NJ. Encyclopedia of Research Design. London: SAGE Publications Inc.; 2010. [Google Scholar]
  • 18.Wu TF, Lin CJ, Weng RC. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 2004;5:975–1005. [Google Scholar]

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES