Why organisations must embrace the ‘open source’ paradigm

Image generated by DALL·E by OpenAI

This article is featured on LSE and is written by Aurélie Jean, Guillaume Sibout, Mark Esposito, and Terence Tse.

In the era of artificial intelligence, information is collected and processed automatically on a large scale. However, translating that into innovation is a challenge. Aurelie Jean, Guillaume Sibout, Mark Esposito and Terence Tse write that opening datasets and computer source codes may help solve the problem. Sharing information accelerates the pace of co-innovation, facilitating inter- and multi-disciplinary research and study, as well as expanding and propagating scientific knowledge and research results.


The quote Information is power relates our level of influence and power to the amount and quality of information we own. Today, more than ever, and thanks to technology, we share data and information to reach greater influence and power to optimise decisions, detect key insights to a given phenomenon or customise user experience. (Big) data allow us to get insights and counter-intuitive information, or to track in time and space the evolution of information on a phenomenon, for instance.

In the COVID-19 pandemic, open-data platforms, open research and open-source software programs demonstrated the super power of the opening paradigm: sharing information to overcome a large-scale challenge such as forecasting the virus propagation based on worldwide data collection and collaboration.

A year ago, the rise of technologies like ChatGPT resulted in discussions about protecting human rights or freedom and democracy by sharing information on how some algorithms are developed. More recently, a global discussion on open AI, which led to an open letter, has focused on how to take greater advantage of open-source algorithms in order to accelerate and challenge any piece of implemented algorithms to the benefits of all, while possibly significantly decreasing any threats.

The legacy paradigm

Information has been part of the economy and has been used as leverage to increase the power of an individual or institution. Scientific globalisation was made possible with the Gutenberg press in the 15th century and the Watt steam engine in the 18th century. These two innovations made it possible to share knowledge, discoveries and theories. The first countries that took advantage of the novelties were able to own the knowledge and the innovations, and increase their power. The same applies to companies or individual inventors.

Now, with big data and artificial intelligence (AI) feeding algorithmic models, information is collected, structured and processed automatically on a large scale to provide understandings, predictions or answers to specific questions. Despite the obvious benefits in many fields, AI presents threats we need to fight, such as discrimination and environmental impact. Additional challenges are the limitations on the size of datasets and the talent pool we need to access to come up with novel technologies. Opening some datasets and computer source codes can help us overcome these limitations and develop next-generation breakthrough innovations while protecting our fundamental rights.

Sharing information

Sharing information makes it easier for anyone to overcome the most challenging obstacles by accelerating the pace of co-innovation, facilitating inter and multi-disciplinary research and study, as well as expanding and propagating scientific knowledge and advanced research results.

During the COVID-19 pandemic, many countries shared their health statistics to feed predictive models and get relevant insights on the pandemic in a short time, which accelerated research in AI applied to healthcare and increased interest in accelerating the academic peer review publication process. Sharing information generally enables large-scale data-driven decisions to manage crises.

Some AI-based technologies require diversified large-scale datasets that often can be retrieved only by accessing open-data sources such as ImageNet, a platform used to train image recognition algorithms. Finally, training datasets from large databases enriches and diversifies perspectives by offering greater diversity and representativeness, thus decreasing the likelihood of bias and guaranteeing the inclusiveness of the resulting innovation.

This new paradigm based on sharing information also enables us to protect the fundamental rights of people as it encourages key players to share how they built technologies that can have a significant, and in many cases negative, impact on free will and democracy. The accelerated propagation on social media of conspiracy theories and fake news demonstrates the urgent need to make publicly available the recommendation algorithms on platforms such as X, Facebook, TikTok and ChatGPT.

Concrete examples

Open-research publication platforms such as Science OpenOpen AccessResearchGate or Welcome Open Research enable the sharing of research results and methods, thus accelerating and facilitating academic research and developments. They confront outputs, helping improve consensus, scaling solutions to practical problems faster and translating them more easily to industrial applications.

Open-source software and libraries are enabling faster co-developments by providing developers, scientists and engineers with ready-to-use computer programs and software functionalities with access to the source code (open-source software) or without (libraries and application programming interfaces (APIs)). The Python library named TensorFlow is commonly used by anyone implementing machine-learning algorithms.

Open-data platforms such as the World Bank Open Data or the World Health Organization’s open-data repository make possible the creation of representative training datasets to analyse an issue or to increase the accuracy of statistical metrics. This allows us to create more efficient algorithmic models to solve large-scale and complex problems. We could also mention the US Census Bureau, the Bold Open Database by Veuve Clicquot, or more recently Météo France, which will soon share publicly their data to leverage competencies from talented individuals for the analysis of climatology and real-time weather data.

How to act

As an actor in the private sector, you need to distinguish between the algorithmic technologies and data that are key in your intellectual property and business model, and the secondary ones that eventually support the first ones. You can also share specific pieces of your source code in order to take advantage of the open-source paradigm, including code and model benchmark, algorithmic bias detection or general improvements, while preserving your intellectual property for a given period.

In addition, without opening some of the source code of your technology, you can envision sharing and opening some or the entire dataset you used to build the algorithm(s) embedded in your technology.

Sharing the best practices that define part of your algorithmic governance might make teams and companies more competitive, since they become more trustworthy and therefore more attractive to users, consumers, the public and markets. Finally, sharing your mistakes, your failed attempts as well as your learnings is also critical in the openness paradigm, which will provide every actor with a safe space to share, discuss and challenge each other.

Conclusion

There is a growing discussion around “openness”, which is likely to become a standard vision for public and private institutions. Next-generation expectations include building and deploying a concrete and specific open strategy by defining the innovation components to open, such as the data, the algorithm and the source code, as well as the conditions for sharing. This is part of data and algorithmic governance.


About the author

Aurélie Jean is a computational scientist, entrepreneur and author, specialized in algorithmic modelling. She has a PhD in Material Sciences and Engineering, option computational mechanics and mathematical morphology, from Mines Paris, PSL University, France.

Guillaume Sibout is a consultant with In Silico Veritas (ISV), a consultancy specialising in strategies and governance of algorithms. He has an Executive Master’s degree in the digital humanities, innovation, transformation, media and marketing from Sciences Po.

Mark Esposito is Professor of Strategy and Economics at Hult International Business School and Director of the Futures Impact Lab. He is equally a Harvard social scientist, with appointments at the Center for International Development at Harvard Kennedy School; the Harvard’s Institute for Quantitative Social Science and the Davis Center for Eurasian Studies.

Terence Tse is a professor of finance at Hult International Business School.

🤞 Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.

[email protected]

About

Ecosystem

Copyright 2024 AI Native Foundation© . All rights reserved.​