Artificial Intelligence requires Data

May 29, 2022 | Insights

Data accumulation is the first stage of an AI project. An organization needs to either create or acquire data during this stage. One alternative is to have organizations partner with other organizations that are willing to share data. Data that are most useful for organizations are ones that accurately represent company-specific activities. 

Organizations generally have many disparate sources of data. There may even be dark data that an organization does not even know exists. Finding, organizing, cleansing and processing all this data can take substantial time. Also, the organizations may not have kept good data hygiene, resulting in data debt. Such debt refers to the cost of additional rework that needs to be done because of poor data practices of the past. Organizations should stop creating this debt; otherwise, someone in the future would have to pay that debt. Industry best practices should be used to groom the data garden and maintain good data hygiene regularly. 

A disciplined approach is needed to cleanse data, standardize it, and integrate it. Data consistency is vital for better results and integration purposes. Systematic techniques need to be used to deal with missing data and outliers. Ideally, an organization needs to get to a point where there is a single truth of data. Currently, that is not the case with many organizations. 

A high volume of structured and high-quality data is required for AI models. Ideally, such data should be available in a form that can be automatically ingested by AI models. An organization should define specific data quality aspects that it wishes to maintain and set up systems to perform these checks automatically. If an AI model is trained on data that is either unclean or of suboptimal quality, AI model output may not be entirely trustworthy. Due to the importance of data quality, some large companies are investing heavily in data quality management. 

While working on AI model development, it is crucial to prioritize cleansing and curation efforts on datasets needed to solve specific problems. This is because these efforts take substantial time and can delay AI projects if not prioritized. It is a good practice to clarify the expectations of a given dataset before doing any major work with it. 

Author: Dr. Jodie Lobana

Image Attribution: Programming Background photo created by kjpargeter – www.freepik.com

Written By Dr. Jodie Lobana

"Empowering the future through AI governance," Dr. Jodie Lobana is a distinguished Director, Educator, Author, and award-winning Management Consultant. Her integrated expertise spans Governance of Artificial Intelligence and other Information Technologies, Risk Management, Internal Audit, Project Management, Human Resources, Accounting & Finance.

Related Posts

Important takeaways from CDO MaGazine Leadership Summit

Important takeaways from CDO MaGazine Leadership Summit

Dr. Lobana's important takeaways from the May 2025 CDO Magazine Leadership Summit: Here are 5 key messages that resonated with me from the summit:- The Risk of Doing Nothing Is GrowingThere’s a new kind of risk: the risk of missing out (RoMO). Inaction means falling...

read more