Data analytics and machine learning are increasingly used every day and companies that attempt the adventure also face integration problems in general. To meet these challenges, IBM just introduced CodeFlare, an open source framework, which is based on the Ray distributed system from the RISE laboratory from the University of California at Berkeley for machine learning models.
CodeFlare aims to simplify the AI iteration process with specific elements to scale flows data work and grew out of a project in the IBM group responsible for creating one of the world's first 2-nanometer prototype chips.
IBM says CodeFlare helps simplify the integration and efficient scaling of big data and artificial intelligence workflows in multi-cloud infrastructures.
"CodeFlare takes the notion of simplified machine learning ... one step further, going beyond isolated steps to seamlessly integrate end-to-end pipelines with a data scientist-friendly interface such as Python, not containers," Priya Nagpurkar, Principal Hybrid Cloud Platform at IBM Research, VentureBeat said via email… differentiates itself by simplifying the integration and scaling of entire pipelines with a unified runtime and programming interface. "
In a blog post, IBM explained that creating machine learning models these days is an intensely manual task.. Researchers must first train and optimize a model, which involves tasks such as data cleansing, feature extraction, and then model optimization, and this is where IBM said CodeFlare helps simplify this work.
Since CodeFlare uses an interface based on the Python programming language to create a pipeline, through which it is easier to integrate, parallelize and share data. CodeFlare can then be used to unify pipeline workflows across multiple cloud computing platforms, without learning a new workflow language for each type of infrastructure.
IBM said the pipelines can be deployed on any cloud infrastructure, including the new IBM Cloud Code Engine, which is a serverless platform and Red Hat OpenShift, plus it also provides adapters for event triggers, such as the arrival of a new file, which means the pipes can integrate and connect with other cloud-native ecosystems, IBM said. .
Moreover, it also allows data to be loaded and partitioned from numerous sources, such as cloud object stores, data lakes, and distributed file systems.
The main benefit of using CodeFlare to set up new machine learning projects is speed. The company claimed that when one of its users applied CodeFlare to analyze and optimize 100,000 pipelines to train machine learning models, it reduced the time to run each from four hours to just 15 minutes.
Speed is important, IBM explained, because data sets are getting bigger and bigger, which means machine learning workflows get more complex and complex. As such, researchers spend more time configuring their settings before they can get things done.
"IBM is pursuing this by using open source CodeFlare as a framework for data workers and developers to build artificial intelligence models that can run on any cloud," said Mueller. "CodeFlare runs on RedHat OpenShift and achieves multi-cloud capability from there."
IBM said that:
CodeFlare is going open source today it is available in the IBM repository on GitHub, plus it is also releasing several samples of CodeFlare pipelines that it has created and that run on the IBM Cloud and Red Hat OpenShift.
Finally if you are interested in knowing more about it or be able to review the source code of CodeFlare, you can do it from the following link.