Roke provides solutions across all aspects of Artificial Intelligence (AI) and Machine Learning (ML), tackling unusual and challenging business problems that require novel thinking. But for many, AI is still a confusing subject that they feel is best left to tech geniuses or sci-fi movies. Not so, says Roke Graduate Cyber and Networks Engineer Anna, who recently held a talk at the Manchester Tech Festival entitled, ‘AI for the completely baffled’.
Anna studied at Manchester University, where she developed an interest in cyber and cyber security. Explains Anna, “I became interested in understanding how AI models are created and built and from where they originate. Put simply, all AI comes from data. It’s a simple function machine that takes an input and produces an output.
“You can start with a decision boundary. For example, I will lend you £100 if you can earn £100. That decision boundary is a simple ‘more than’ or ‘less than’ equation. The data input can be any historical data that exists, which will give information on a set of given salaries and whether these people paid back a £100 loan in the past. That data then helps to decide where to put the decision boundary.
“You can then increase the complexity by adding more data inputs and more decision boundaries, effectively creating an artificial neural network. Artificial neural networks are responsible for many of the recent advances in AI, including voice recognition, image recognition, and robotics.”
Continues Anna, “Anything that you get data out of can become something for which we can build a neural network. For example, banking data or personal data. All digital information is made up of a series of 1s and 0s, whether it’s text, images, audio, or video. It’s all data that we can use in models to create decisions. Image classifiers are a great example where images are used as the input data and the output is the classification of what the machine is seeing.”
To build a robust model, you need robust data, though. That’s where data science ,AI and ML meet. Before data can be used effectively for training or testing a machine-learned algorithm, it should be properly cleansed, processed, and stored. Effectively presenting the outputs of that algorithm is also key to delivering success, so data science and, AI and ML are closely linked in the lifecycle of gaining insight from data.
Explains Anna, “If you take a language model as the example, you can see that words and text are all data too. There are multiple different methods and equations for converting words into digital information. For example, one such algorithm will give the effect that King-Man+Woman=Queen. Anything that is data is input, so the possibilities are endless, yet the simple concept of data input producing an output sits underneath it all.”
But how can we make sure that our data is not being used or exploited, and can we really trust ML and AI? Says Anna, “AI is taking innocent data and asking what we can learn or how can we use that data in different ways to extrapolate information. Part of my role is looking at the different AI tools and programmes and seeing how we can make them work in different ways to get the effects or affects we desire. By effectively playing and breaking the models, we can find any vulnerabilities in the ML system.”
Roke works with the National Cyber Security Centre (NCSC) on strategies for AI that are resilient to cyber-attack. Malicious actors have learnt ways in which to exploit ML algorithms to cause issues for system owners. Examples include tricking self-driving cars into reading incorrect speed limit signs and deceiving face and object detection systems to provide incorrect readings.
The complex nature of ML technology means that many developers inadvertently leave security flaws within systems, which can lead to exploitation by attackers.
Case Study
Roke was tasked to assess the ease with which adversarial techniques could be used to ‘fool’ AI systems used in applications such as video surveillance. They developed adversarial patches – 2D images to be attached to the flat surface of a 3D object. These patches were able to fool AI image classifiers, creating false positives by placement on roads, attachment to walls, and display on clothing or accessories such as bags/hats.
Roke provided demonstrations that illustrate adversarial patch attacks in the customer’s operational environment. They’ve since further developed concepts to build other adversarial attacks and novel defence measures, all operating within real-world environments and the associated cyber threats.
Why is AI so baffling for so many of us?
Explains Anna, “AI and ML systems are not revolutionary things. They are built from simple concepts using data to make decisions. As AI has become commercialised so has the increase in marketing spin and complex language used to describe it. That’s why people think it’s so baffling. Neural networks as a concept are nothing new. The first artificial neural network was created in the 1940s.
“Today, we produce huge amounts of data and have the technology to enable AI to truly take off. There’s this misconception that AI isn’t for the ‘little people’, and that’s driven by all these buzz words and jargon used to describe it.
“Chat GPT is probably one of the more widely used examples of everyday AI that people are starting to use in the workplace. Again, that’s simply data input producing an output. People are seeing that it’s good at simple tasks, but it always needs checking and verifying as it’s entirely dependent on the data input.
“Bias and discrimination are also other key issues that we must solve to move AI and ML forward. Our data sets are often biased as the data we’re using in AI models has come from years and years of bias in our own history. For example, if you ask Chat GPT for a common name, will it say a boy or a girl? Will it be a White British or a Southeast Asian name? It all depends on the data input and if that data is bias then so will be the AI’s answer.”
For example, a recent study* on mortgage loans revealed that the predictive models used for granting or rejecting loans are not accurate for minorities. Scott Nelson, a researcher at the University of Chicago, and Laura Blattner, a Stanford University economist, found out that the reason for the variance between mortgage approval for majority and minority groups is that low-income and minority groups have less data documented in their credit histories. [*Cornell University, May 2021]
Adds Anna, “There is also the issue of plagiarism when using AI like Chat GPT and how people can watermark content created by AI. All of that is extremely difficult to do, and regulation is lagging quite simply because we’re in an AI arms race. No one wants to be the first to regulate AI and restrict how it’s used.
“And there are still lots of examples of when AI gets it wrong, which can be down to a huge range of things, from insufficient training data to a "naïve", simple model, or even an overly complex one.
For example, systems can flag up plagiarism when it’s someone using English as a second language. Or just look at how wrong we got it when using AI to predict GCSE results during Covid!
“In conclusion, there is huge potential for AI, and it certainly isn’t something novel or new. However, there is still so much research and regulation needed to better understand how we can effectively use AI in the future. We're at the beginning of the journey, and it’s a very exciting one, but we don’t have it all figured out quite yet.”