Synthetic data’s pivotal role in boosting innovation

By Smith Institute

With today’s computing technology, data-driven problems in forecasting and optimisation that were previously impossible can be solved in short order. In concert with exciting developments like quantum computing, this power is set to keep multiplying many times over. Good data sits at the heart of these solutions. But there are many instances where it’s impossible, impractical or illegal to use real data, potentially stopping innovation in its tracks. 

Synthetic data is a way forward. Rather than being the data equivalent of ‘lorum ipsum’ placeholder text, it accurately reflects the real data to enable significant and valuable conclusions to be drawn. Using AI and modelling, you can synthesise whole swathes of data in a repeatable and realistic fashion. By synthesising input data from scratch or by creating new data from old, you can build, test and demonstrate a proof of concept at maximum velocity with total control and with your risks mitigated. 

Synthetic data keeps the build moving 

The future of data-driven computing looks exciting, and you want to make decisions fast. Maybe quantum optimisation is ideal for your goals, or your next killer app will depend on a graph neural net. But you need data to go deep in the build process – and different compute means different data to what you already might have. Synthetic data can bridge the gap. 

Perhaps you’re leaning on the latest graphics processing unit (GPU) to power ultra-high-resolution forecasts. Or perhaps you’re considering quantum computing, with its potential to solve optimisation problems that are intractable today. What if your input data isn’t high enough resolution, or you simply don’t have enough? How do you test your nascent solution? Or demonstrate it to the board? 

Starting from scratch 

If you only have old data – or none at all! – that shouldn’t stop you building new analytics solutions. If anything, it’s an opportunity. For many data sets, you can build a mathematical model that represents the fundamental behaviours you anticipate, like surges in power use at teatime or drivers diverting from a closed road. How rich a data set you create depends on the complexity of your use case: the metaverse is being used as synthetic data for training autonomous vehicles, for example. But even if your synthetic data is only a rough approximation of reality, it can still be hugely valuable for testing high-impact scenarios. 

You can also use data synthesis to fill in the blanks. Sometimes you can’t measure everything, or can’t record things often, like in the core of a nuclear reactor. Informed by your best understanding of the whole, a model of the system underneath can help you extrapolate from what you know to what you don’t, using techniques like Bayesian inference powered by scalable cloud compute. 

Mimicking data to mitigate risk 

On the other hand, when you’re data-rich, synthetic data can be used to mitigate risk. What if you’re uneasy about using real data in a development sandbox, or when demonstrating to potential partners? To avoid using real data until it’s necessary, you could train a generative adversarial network to make new data from old. 

It’s vital to consider privacy and regulation, even with synthetic data. If you use real data to train a synthesiser, signatures of that data could still be present in what you generate, especially if overfitting is a risk – so GDPR may apply, for example.  Beware too of issues like copyright as regulation catches up with innovation, coming to the fore in today’s diffusion-based image generators. Your least risky approach in regulated domains could be to synthesise realistic data from scratch, with the added benefit that it’s now totally in your control. 

Where next? 

As compute moves forward, we’re seeing a shift. Tomorrow isn’t about just making faster versions of the hardware we already use; it’s about harnessing different technologies like quantum computers and digital annealers. But building new high-performance solutions on new hardware needs new data for development, testing and demonstration. Synthetic data can help you better incorporate AI and other types of modelling into your business today while also allowing you to experiment with that technology of tomorrow. The synthetic world is your oyster. 

If you would like more information about synthetic data and how it can help you meet your data science and wider business objectives, get in touch with us here. 

Discover how we help EDF to deliver safe low-carbon energy to the UK.
EDF Logo
Don’t miss an Insight or Case study
Get summaries of our latest Insights delivered straight to your inbox.

Office Address:
Willow Court, West Way, Minns
Business Park. Oxford OX2 0JB
+44 (0) 1865 244011

© Smith Institute 2024. All rights reserved. Website by Studio Global