Table 7.
AI-ACT | Launch | Layers | ||
---|---|---|---|---|
Name | Year, by | Technological infrastructure | Functional | Human (auditability and bias) |
Projeto Cérebro (Brain Project) | 2015, Council for Economic Defense (CADE) | Using public databases related to ownership register and public procurement, it applies data mining, descriptive statistics, and economic filtering with R and Python to generate a dashboard signalling any potential risk and/or sign of violation of competition law | Improving horizontal accountability mechanisms and supporting human action in the Council of Economic Defense with analytics and visualization. No machine learning technics are in place yet | Auditable (internally and by control agencies), but not open to the public. No info about bias, although there is a risk due to the data and processes used |
Rosie and Jarbas | 2016, Operação Serenata de Amor (Love Serenade Operation)-Open Knowledge Brasil | A Python-programmed application that first applies hypothesis and test-driven development processes and then unsupervised learning algorithms to estimate a “probability of corruption” based on standard deviations for each reimbursement receipt submitted by MPs. Rosie processes public and private open databases made available by the Lower Chamber, Revenue Services, Google, Foursquare, and Yelp. Findings are on Rosie's Twitter account and on a dashboard named Jarbas | Improving social accountability by inviting people to check suspicious cases, after auditing MP expenditures with machine learning techniques and communicating findings via Twitter and online dashboard with filters | Auditable, open-source GitHub hosted. Not free from bias as data are clustered and standard deviations are used in the analysis |
Rui Barbot | 2018, JOTA news website | A crawler created in Python to collect public and open data of hundreds of procedures on the Supreme Court website. It checks when each one was updated and, if any procedure is reaching 180 or 270 days or years without movement, Rui tweets automatically about the anniversaries and informs its findings by emailing journalists from JOTA, the news outlet that created the bot | Improving social accountability by calling attention to bottlenecks in the Brazilian Supreme Court by using analytics and communication tools to signal which procedures are facing sluggish progress | Auditable. Risk of bias as the list of monitored procedures and classifiers to measure delay were not randomly selected, although they are based on journalistic criteria |
Esmeralda | 2019, Court of Accounts of the Municipalities of the Goiás State (TCM-GO) | Esmeralda processes data received through a system created to transfer data from and to the municipalities in the State of Goiás audited by the local Court of Accounts. Using both open and protected data, it identifies anomalies related to the purchase and payment of goods, services, workers, and benefits using data mining and statistical analyses. Auditors can access the data and analytics on their desktops. Findings are also transferred via Elasticsearch, a free and open search and analytics engine. Notifications of risky cases can be sent to auditors via internal chat | Improving horizontal accountability mechanisms and supporting auditors' action with visual analytics and communication tools | Auditable internally and by control agencies, but not open to the public. No info about bias, although there is a risk due to the data and processes used |
Iris (Indicator de Risco de Irregularidades em Contratações, or Irregularities Risk Indicator of Public Contracts) | 2015, Court of Accounts of the State of Rio de Janeiro (TCE-RJ) | Iris applies a multicriteria model based on an analytical hierarchy process to establish relationships and calculate risk factors by using data mining, deep learning, and image processing. As inputs it uses sensitive, protected, and open data provided by municipalities and the State of Rio de Janeiro related to public procurement, along with data on business ownership and aerial and street-level images. Among the processes in place, Iris uses Google services to convert addresses into geographic codes and to generate aerial and street-level images of the companies that are participating in the bidding. Based on these images, a model applying a convolutional neural network (CNN, or ConvNet) evaluates the probabilities of each participant being a ghost company. Ready-to-use libraries and image recognition models (e.g., TensorFlow and Inception v3) are used. As outputs, Iris offers a spreadsheet indicating the percentage of risk of each of nine risk factors linked to detailed reports that can be accessed through a web browser | Improving horizontal accountability mechanisms and supporting auditors' action with visual analytics and communication tools | Auditable (internally and by control agencies), but not only it is closed to the public, but TCE-RJ is not very transparent about Iris. Even overall information on inputs and outputs and software used are classified as sensitive and were not made available via Access to Information Law. There is a high risk of bias and inaccuracy, e.g., Google Street View tend to be outdated |
DIB (Detecção de Irregularidades em Benefícios, or Detecting Irregularities in Benefits) | 2018, governmental Social Security Technology and Information Company (DataPrev) | Multiple-technique tool for benefit fraud detection that uses as inputs governmental datasets (with protected social, labour, and pension governmental data) and deploys data mining and statical analyses with Python and SAS Miner and rules engine with audit trails already known or defined in the data exploration stage. It also uses techniques such as deep learning by applying CNN. The algorithm was trained based on a sample of 3,000 to 6,000 benefits, with 85% accuracy. It offers panels and storyboards to identify patterns and signal risks of fraud and suspicious cases, using data visualization applications, such as React and SpringBoot, georeferencing libraries (leaflet) and QlikView | Improving horizontal accountability mechanisms and supporting inspectors’ action with visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. The risk of bias depends on the quality of data available and the models adopted |
Alice (Análise de Licitações e Editais, Analysis of Biddings and Call for Bids) | 2015, Office of the Comptroller General (CGU) and later improved by the Federal Court of Accounts (TCU) | Programmed in Python, Alice reads tenders published on Comprasnet (public procurement portal) and the Federal Official Gazette daily looking for keywords and values that will allow it to identify signs of irregularities based on previously programmed audit trails. It mixes text pre-processing, data mining, machine learning (Random Forest, Support Vector Machine, Bernoulli Naive Bayes) and regular expressions techniques. Alice also crosses data, including names of suppliers, products and prices aiming to identify, for example, competing companies that have common owners or previous issues related to public procurement. It applies supervised learning to create a classification of risk, allowing auditors to act preventively in the time frame between the bid notice and the submission of the bid and before the bidding award/contract signature. It offers communication and data visualization tools with automatic daily emails with alerts and a dashboard with overall and detailed information on each suspect case | Improving horizontal accountability mechanisms and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. The risk of bias depends on the quality of the data available and the models adopted. Accuracy was 88.27% during the test phase and, later, during further auditing, it was 73.91% |
Agata (Aplicação Geradora de Análise Textual com Aprendizado, or App for Generating Textual Analysis with Learning) | 2020, Federal Court of Accounts (TCU) | Using data available on Comprasnet (governmental procurement portal), it is an active machine learning process based on keywords searched by auditors. Programmed with Python using the library that Apache Solr includes, it applies text classification and segmentation, sectorisation (converting words into numbers) and hierarchical clustering of similar texts. The auditor is requested to say if the search is what she is looking for, which trains the algorithm via logistic regression. The process was constructed as a game in which the user gains agates to engage them. When the user gets three stones, she can subscribe to receive notifications via email. It also offers a dashboard with overall and detailed information on keywords that may be of interest and more likely to require an inspection | Improving horizontal accountability mechanisms, engaging with and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. The risk of bias depends on the most frequently searched words and their different meanings |
Monica (Monitoramento Integrado para o Controle de Aquisições. Integrated Monitoring for Procurement Control) | 2017, Federal Court of Accounts | It deploys data mining and statical analyses to look for anomalies in public procurement in the federal legislature, judicial and executive branches and the prosecution service. It offers data visualization tools with filters and the possibility to download spreadsheets | Improving horizontal accountability mechanisms and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias |
Adele (Análise de Disputa em Licitações Eletrônica, Dispute Analysis in Electronic Bidding) | 2018, Federal Court of Accounts | As inputs, it uses protected and sensitive data on electronic biddings, such as IP addresses of the bid participants and data on companies and individuals. Data mining allows informing inconsistencies and anomalies that are informed in the form of a dynamic dashboard | Improving horizontal accountability mechanisms and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias |
Sofia (Guidance System on Facts and Evidence for the Auditor) | 2020, Federal Court of Accounts | Available to the TCU auditors, Sofia works as an extension that reviews draft texts by verifying sources of reference and identifying the correlation between the information written in the text and other procedures, e.g., names, IDs and fiscal codes. It informs, for example, if a person is dead or under investigation or has been convicted in the Court of Accounts. The information appears in comment boxes along the text | Improving horizontal accountability mechanisms and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias. If biases are not considered, the tool may generate misinterpretations and negative impacts on auditing |
Carina (Crawler e Analisador de Registros da Imprensa Nacional) | 2020, Federal Court of Accounts (TCU) | It uses data crawling, along with machine learning and regular expressions techniques applied to analyse data daily from the Federal Official Gazette as an attempt to identify anomalies related to public contracts and bids. Carina offers a dashboard and emails with information about suspicious and anomalous purchases of urgently vital health products and services needed in the response to COVID-19 | Improving horizontal accountability mechanisms and supporting inspectors’ action with alerts, visual analytics and communication tools | Auditable, but not open to the public. Internal and external control units could audit it. If biases are not considered, the tool may generate misinterpretations and negative impacts on auditing |
Sisam (Sistema de Seleção Aduaneira por Aprendizado de Máquina, customs selection system through machine learning) | 2014, Federal Revenue Service (Receita Federal) | Sisam learns both from inspected and non-inspected import declarations and calculates the probability of about 30 types of errors for each declaration, e.g., false description of goods, missing licenses, wrong preferential tax claims. It is a set of Bayesian networks whose conditional probability tables have been replaced by smoothing hierarchies. The system is implemented in Java. Its inputs are sensitive and highly protected economic, financial, tax and customs datasets. As outputs, Sisam offers an interactive spreadsheet with colourful highlights. Sisam also produces a written report based on natural language explanations presenting error probabilities, alternative values for each field of import forms that can be wrong, and the return expectation of each possible inspection | Improve customs control by generating aggregated data about importers, hence helping post-clearance revisions. It supports inspectors with alerts, visual analytics and communication tools to fight fraud and tax evasion in import operations. These can be “copied and pasted” to inspection reports | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available. Depending on how the algorithm is trained, it may not detect some cases of fraud whereas it may raise false positives in other cases, changing the focus of the clearance procedure |
Aniita (Analisador Inteligente e Integrado de Transações Aduaneiras, Intelligent and Integrated Analyser of Customs Transactions) and BatDoc (Batimento Automatizado de Documentos na Importação, Document Mismatch Detector) | 2012, Federal Revenue Service (Receita Federal) | It uses sensitive and protected data such as import declarations and digital images of auxiliary documents, e.g., invoices and bills of lading, to identify anomalies related to possible cases of tax evasion and tax avoidance in customs. Part of the Sisam, BatDoc is a tab of Aniita. It applies optical character recognition to the digital images of auxiliary documents, identifies relevant fields and performs transformations to the data to handle differences in how these fields are presented in each document. Implemented in Java, it uses Abby, a commercial optical character recognition tool. It offers a dashboard with detected divergences in company names, addresses, prices, quantities, HS codes, and incoterm codes comparing the description of the goods in the invoices and their description in the import declaration | Support human decisions by identifying suspicious elements in imports | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available. Depending on how the algorithm is trained, it may not detect some cases of fraud whereas it may raise false positives in other cases, changing the focus of inspectors' action |
Malha Fina de Convênios (Singling out agreements for inspections) | 2018, Office of the Comptroller General (CGU) | A predictive model created by the CGU indicates with a degree of certainty whether accounts presented by those who received federal funds can be rejected or not accepted at the moment they are presented for examination and settlement. Written in Python, it deploys automatized analysis of 104 variables using supervised machine learning (Random Forest) and uses public and protective data available on Siconvr and Plataforma + Brasil platforms, along with the alerts of possible irregularities signalled by other tools developed by the CGU. It provides a score to measure the likelihood of having accounts rejected or not (1 to 0) | Supporting human decisions by identifying elements of misuse of federal funds | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available. If biases are not considered, the tool may generate misinterpretations and negative impacts on rights |
Faro (Ferramenta de Análise de Risco em Ouvidoria, or Risk Analysis Tool for Ombudsman) | 2019, Office of the Comptroller General | Faro classifies complaints and denunciations received via the Fala.BR online system (governmental crowdsourcing platform) as suitable or not. Complaints considered suitable meet the minimum requirements to justify the opening of an investigative procedure. Programmed in Python, it uses supervised machine learning techniques (ensembles of decision trees) and textual analyses and crosses information with data already available in investigative procedures and other database storage at the CGU. Around 3,500 registries were used to train the algorithm. As outputs, it offers suitability for investigation (score from 0 to 1) | Supporting human decision. Complaints with high scores, i.e., considered suitable, are sent to inspectors to open investigative procedures. The ones Faro considers not suitable are sent back to the complainant, asking for supplementary information | Auditable. The model is auditable by the development team and outcomes by the business area (ombudsman). Cases with a high degree of doubt are analysed by auditors. Depending on how the algorithm is trained, it may not detect some cases of fraud whereas it may raise false positives in other cases, changing the focus of the investigation |
Watson | 2017, IBM | Watson works as a text indexing tool of all databases available at the federal police in the State of Rio Grande do Sul and allows searches regarding investigative procedures based on data mining, natural language processing and classification techniques. It was written in Java and XML and uses non-supervised learning. It offers an output dashboard with questions and answers | Facilitate the work of police officers | Auditable by the federal police in the State of Rio Grande do Sul. There is a risk of bias. When not analysed for possible bias detection, may impact negatively on the presumption of innocence principle |
Delphos | 2020, Federal Police (PF) | Using natural language processing, Delphos makes predictions and signals risks, including the risk of fraud, by recognizing and crossing names of people, companies, addresses, values of contracts and payments, emails, and telephone numbers. It uses public and protected databases | Facilitate the work of police officers | Auditable, but not open to the public. Internal and external control units could audit it. Natural language processing tools used to predict risks related to criminal actions may impact negatively on marginalized populations. The algorithm may easily be biased against low-income people who do not know Portuguese well or those who use informal language, abbreviations and certain types of slang |
Cida (Chatbot Interativo de Atendimento Cidadão, or Interactive Chatbot for Citizen Service) | 2018, Office of the Comptroller General (CGU) | Although the CGU Cida does not use AI techniques, it is a chatbot implemented in Java that interacts with Fala.BR platform through the web and uses the Facebook and Telegram libraries to communicate with users. It guides users who want to enter complaints, denunciations, or greetings or give feedback | Offering opportunities for citizen engagement on social media and opening a dialogue channel. Cida automatically registers the communication on the Fala.BR platform | Auditable, but codes or interactions are not open to the public. Internal and external control units could audit it. It cannot work well with low-income people who do not know Portuguese well or those who use informal language |
Zello (chatbot) | 2018, Federal Court of Accounts (TCU) | Zello conducts online chat conversations via text and uses supervised learning and NLP e Named Entity Recognition techniques. It offers options such as access to procedures based on names provided by users and it also issues clearance certificates. Developed by the TCU, Zello initially used direct messages on Twitter and is now on WhatsApp, interacting with citizens via text messages. It was launched in 2018 to facilitate access to the list of mayors and governors with public spending accounts declared irregular. Since then it has been adding functionalities, such as consulting procedures and actions related to COVID-19 responses | Offering opportunities for citizen engagement on social media and opening a dialogue channel | Auditable, but codes or interactions are not open to the public. Internal and external control units could audit it |
Dados Jus Br | 2020, Federal University of Campina Grande and Transparência Brasil | Organizes and unifies data on salary and top-up salary payments in the bodies that make up the Brazilian justice system. Using R, Java, Python, Go and Vu, it automatized the collection of data and offers them in a searchable dashboard with graphs and tables | Improving social accountability and offering organized access to information often available in different formats and platforms | Auditable, open-source GitHub hosted |
Ta de Pé Merenda and Compras Emergenciais | 2020, Transparência Brasil and Federal University of Campina Grande | Extraction, processing, and storage of data from two State Courts of Accounts (Rio Grande do Sul and Pernambuco), and the Federal Revenue Service using TypeScript, Java, Python, R and HTML. It highlights contracts and services with irregularities in a dashboard | Improving social accountability by offering detailed information on contracts with issues detected at the municipal level | Auditable, open-source GitHub hosted |
CoviData | 2020, Federal Court of Accounts (TCU) | Makes predictions and provides risk scores based on public and sensitive databases and pieces of news. It deploys a language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers, for supervised learning to identify risks | Improving horizontal accountability and supporting human decisions by identifying suspicious elements and scoring the risks | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available |
PLACC (Plataforma de Análises Cognitivas para o Controle, or Cognitive Analytics Platform for Control) | Year not available, Federal Court of Accounts (TCU) | Models that use as input texts of court documents and semi-structured metadata are available on the federal justice website, by applying the conditional random field (CRF) and long short-term memory (LSTM) techniques. The main goal is to identify individuals and companies, extract relevant information about them and establish relationship networks and money flows | Improving horizontal accountability and supporting human decisions by identifying suspicious relationships and connections | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available |
Ta de Pé | 2017, Transparência Brasil | A mobile phone application that was later converted into a WhatsApp chatbot, it was developed for citizens to monitor school construction projects in Brazilian municipalities. The user submits data about construction sites that are behind schedule or with no work in progress. Such crowdsourced information is sent to independent engineers, and Transparência Brasil contacts the mayors’ offices about project delays. A Twitter bot posts a message each time a user submits a new picture for evaluation, or a municipality responds to a citizen’s request | Fostering bottom-up accountability in the Brazilian public sector and improving responsiveness in government education expenditure | Auditable, but codes and interactions are not available on the Transparência Brasil website. An academic study showed that the app has a null impact on school construction indicators and that politicians are unresponsive to individual requests |
Ajna (Plataforma de Visão Computacional e Aprendizado de Máquina, Computer Vision and Machine Learning Platform) | 2017, Revenue Service | Implemented in Python, it uses the TensorFlow library for deep neural network and SciKit-Learn and Random Forest regressors to estimate potential risks and threats regarding the goods observed in x-ray images in the port of Santos. All containers leaving or entering the country are scanned. Ajna collects the resulting images, associates them with the corresponding declarations and makes the images available to customs officers in their web browsers whenever convenient. Hence, Ajna applies computer vision, data mining and optical character recognition techniques for classifying, predicting and scanning patterns and warning of anomalies aiming, among other things, to control fraud and improve functional behaviour and workers' security in customs clearance procedures | Improving horizontal accountability and supporting human decisions by identifying suspicious elements in imports and exports | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors. Information on accuracy is not available |
ContÁgil | 2009, Revenue Service | It uses decision trees, naive Bayes, support vector machines and deep neural networks and includes clustering, outlier detection, topic discovery and co-reference resolution features for retrieving and analysing sensitive and protected data. It is available to all Revenue Service employees through an interactive graphic interface. All ContÁgil functions are also available through scripts that can be created visually or written in Javascript or Python. It reads account books and invoice sets, scans the various data sources to which it has access and builds network graphs with people, companies and their relationships | Supporting human decisions and improving identification of big fraud schemes like the one observed in Operation Car Wash | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors Information on accuracy and data inputs is not available |
Publique-se | 2020, Abraji, Transparency International Brazil | Programmed in Python, it automatically checks if there are legal proceedings that cite politicians as defendants or plaintiffs in the higher, federal and local courts. It has already identified 3445 politicians. It has the option to download the list of procedures related to active and passive corruption and administrative improbity. It was developed by a company called Digesto, which specialises in law technology | Supporting human action, mainly journalistic work, to check criminal and civil procedures linked to politicians | Auditable, open-source GitHub hosted |
Bem-Te-Vi | 2020, TST | Information from the labour legal language model, built from data coming from the 27 Regional Labour Courts, justice decisions in the past two years. Deploys Word2vec, AutoML, and XGBoost to predict decisions and proceedings and select the most experienced assistant to analyse court cases. It took two years to develop | Supporting human action to speed up procedures and reduce eventual conflict of interests | Auditable, but not open to the public. Internal and external control units could audit it. There is a risk of bias and errors |
RobOps | 2016, Instituto OPS | Programmed in C#, Vue and Java Script, RobOps works daily, collecting data on the Lower Chamber and the Senate, crosschecks with fiscal conditions at the Revenue Service and displays the expenditures raking the politicians in a dashboard. Its finds are also communicated by email. The process allows volunteers to analyse in-depth the spending. New functions have been added to improve both dashboard and automated crawling | Fostering bottom-up accountability in the Brazilian congress and improving responsiveness to public spending by the legislature | Auditable, open-source GitHub hosted |
Sources: Author based on Access of Information Answers; Interviews; GitHub (https://github.com/analytics-ufcg/ta-de-pe-dados; https://github.com/dadosjusbr; https://github.com/okfn-brasil/serenata-de-amor; https://github.com/ops-org; https://github.com/RafaelEstevamReis/Robops); Costa and Bastos (2020); Agata 2020 (https://www.youtube.com/watch?v=1extYx6VWa8); 3º Seminário sobre Análise de Dados na Administração Pública 2017 YouTube (https://www.youtube.com/watch?v=Pw-DW5ptvbQ&t=5963s); Transparência Brasil/Catálago IA (2021) https://catalogoia.omeka.net/; Casos de Aplicação de Inteligência Artificial na Administração Pública—Enastic AGU 2020 https://www.youtube.com/watch?v=2chdQSjM2GI; Freire, Gaudino, Mignozzetti (2020); Transparência Brasil (2021a, 2021b)—https://www.transparencia.org.br/downloads/publicacoes/Governance_Recommendations.pdf; Jambreiro Filho (2019), Coutinho (2012)