Intel released the source code of ControlFlag a machine learning system to detect errors in the code

Intel unveiled through an advertisement developments related to the ControlFlag research project, which is intended to create a machine learning system to improve the quality of the code.

The tool has been released under the MIT license and it stands out for allowing, based on a model trained on a large amount of existing code, to identify various errors and anomalies in source texts written in high-level languages ​​such as C / C ++.

The system it is suitable for detecting various types of problems in your code, from detecting typographical errors and incorrect type combinations, to identifying missing checks for null values ​​in pointers and problems with memory.

The system learns by itself by building a statistical model from the existing array of open source code published on GitHub and similar public repositories. In the training stage, the system determines typical templates to build structures in the code and builds a syntactic tree of connections between these templates, reflecting the flow of code execution in the program. As a result, a reference decision tree is formed, combining the development experience of all analyzed source texts.

To make ControlFlag more available to the broader software development community, Intel is pleased to announce that ControlFlag is now open source and can be accessed at https://github.com/IntelLabs/control-flag. We are pleased to give developers the opportunity to develop on it and see what else can be done with this extremely valuable and innovative technology.

Since its introduction, ControlFlag has been tested in production-level software and widely used open source software systems. For example, last year, ControlFlag identified a code anomaly in Client URL (cURL), a computer software project that transfers data using various network protocols more than XNUMX billion times a day. After reporting the anomaly to the cURL team, they agreed with ControlFlag's findings and subsequently patched their code.

A similar process of defining patterns is performed for the code under test, which is compared to a reference decision tree. Large discrepancies with adjacent branches indicate an anomaly in the pattern that is being verified. The system also allows not only to identify an error in the template, but also to suggest a solution. For example, when parsing the code snippet "if (x = 7) y = x;" the system has determined that the construction "variable == number" is generally used in the statement "if" to compare numerical values, so the indication "variable = number" in the statement "if" is probably caused by a typographical error .

Traditional static analyzers would detect an error of this type, but, unlike them, ControlFlag does not apply out-of-the-box rules, in which it is difficult to foresee all possible options, but rather starts from the statistics of the use of all kinds of constructions. in a large number of projects.

As an experiment, when using ControlFlag in the source code of the cURL utility, often is cited as an example of proven, high-quality code, static analyzers revealed an inadvertent bug when using structure element "s-> keepon", which had a numeric type, but was compared to the boolean value TRUE.

In the OpenSSL code, in addition to the problem mentioned above with "(s1 == NULL) ∧ (s2 == NULL)", anomalies were also detected in the expressions "(-2 == rv)" which was at least a typo.

It is also reported that the use of ControlFlag made it possible to identify several hundred bugs in non-specific proprietary software, leading to crashes and memory problems.

Finally if you are interested in knowing more about it, you can check the details In the following link. As for those who are interested in being able to see the source code, obtain it or clone it in a repository, they can do so from the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.