Big Data: Navigating the Hype of AI and Machine Learning
By David Firmin, MD and Head of Global Trading Research, Instinet
With all the big talk about big data these days, the financial services industry must make better sense of machine learning and artificial intelligence (AI) applications.
Definitions vary widely and discussions are often vague, so it can be hard to determine what is real and what is spin. Without consistency in what these terms mean, it can be a challenge understanding how these advances in processing power have changed the way algo engines work or enhanced the tools that deliver analytics and insights.
What are the criteria that define big data? Is there a standard definition?
The 3Vs: Volume, Velocity, and Variety, are often used to differentiate simple data from big data. Any big data project would factor in these criteria, but we would include a 4th V: Value. Value refers to the quality of the data, as well as the quality of its return on investment (ROI) relative to how it is being used. This is an element we believe all constituencies should actively evaluate, define, and build into their programs.
Volume is a function of the depth and breadth of data and their sources. The term “alternative data” means non-traditional sources that can now be applied to quantitative analysis. We see many more sources, as well as larger quantities of data, and possibly also greater frequency and/or lower-latency real-time increments of data— all of which combine to dramatically increase the overall volume of data that has to be captured, stored, processed, and analysed.
With new sources of data such as social media, machine data, and mobile applications streaming into the ecosystem in real time, support for high velocity extends to not only how swiftly data is captured and collected, but all the way through the process until the application of that data drives a resulting business action. The time horizons between data capture and results have been massively compressed, especially for industries with business models that critically depend upon low-latency capabilities and capacity, such as financial services.
Data variety refers to the many sources and types of data being consumed, both structured and unstructured.
Structured data: This includes the typical market or tick data and transaction reference data that traders or quants have contended with for years. These datasets have predetermined formats that are designed to fit into systems analysed easily.
Unstructured data: Examples of unstructured data include social, sentimental, and voice data. You can find drastic variations between these data points, and they will need to be constructed into a machine-readable format for analysis (becoming structured data). Trade emails, voice, and IM data are good examples of what is captured for compliance and risk analysis.
The final V—Value—is by far the most important. It characterises the potential ROI and strategic impact of big data on your day-to-day business activities or organisation. As we think about the Value factor, the first order of business is to assess the quality of your data. If the content you are collecting is not trustworthy or clean, the entire process is corrupted. More isn’t necessarily more if you cannot be assured that the data being collected is going to add value. By the same token, if the way in which you are applying the data is not well considered, i.e., if you are not “asking the right questions of that data,” then you will not extract benefit from the process. The age-old story of “garbage in, garbage out” certainly applies to the process of big data management, but we can add a new maxim, as well: bad question, bad result.
It’s important to be mindful that big data isn’t a virtue unto itself. Its value lies in its effective application to a specific problem or model.
Why doesn’t everyone use big data analysis?
A tall order.
Big data digitises the sheer volume of information that is being produced globally and synthesises that information to deliver benefits, improve efficiencies, or advance an organisation’s goals. That’s a rather tall order.
To do this, an enterprise must first be able to develop strategies, operations, and the right resources to plan and manage the logistics of all this information.
Requires enterprise-wide change.
There are several imperatives:
A cultural change needs to happen. Traditional financial firms are not set up to take advantage of the data that’s available today. They must adopt new mindsets and skill sets in order to realise the benefits that new technology can bring.
You must have a modern data platform in place to support your big data strategy across the enterprise. Some financial institutions lack the systems and technologies to integrate siloed data and model data to produce insights that they can incorporate into their operations.
Traders and analysts need to be comfortable and effective in applying new techniques. Workflow needs to change, along with their tools and strategies. Committing to these imperatives is not easy.
It requires a change in your business model that aligns the organisational structure, your processes, and technology to create a robust, secure, and scalable data management infrastructure. It also requires having uniquely talented people—not only data scientists, but IT and business people—who know how to pursue the 4Vs and ask the right questions. This mix of talent can be difficult to find, especially when so much of this technology is still new.
Not quick. Not cheap. No guarantees.
The upfront investment of time and resources can be a challenge to firms of all types. Depending upon the nature, size, and mission of a firm, committing to a major, long-term investment such as a big data project can be hard to sell to executive management and boards, since quantifying the tangible benefits and understanding the timeline for reaping the ROI can require a leap of faith.
Analytics has always been important to trading. How has big data analysis changed the way data is used in financial services?
Since Instinet launched electronic trading in 1969, technology has been playing an ever-expanding role in the financial sector. Big data is a significant factor in the most recent rapid evolution of electronification. It is pushing the industry to new heights and across functions such as idea generation, analytics, execution, risk management, regulatory compliance, marketing, and client relationship management.
Big data technology has enabled the storage and analysis of data sets not possible before. Many firms are putting greater emphasis on new data management platforms that enable them to integrate and deliver data and analytics in real time, rather than using numerous separated analytics engines. Using the latest technologies, data infrastructure, and processing methods to harvest greater intelligence from increasingly higher volumes of data is becoming a competitive necessity.
The existence of big data makes the following more possible:
– Real-time responsiveness. Incorporating low-latency stimuli from alternative sources into existing strategies and the behaviour of live orders.
– Heuristic capabilities. Going beyond static models.
– Machine learning. Combining advanced computational analysis and simple automation.
– Artificial intelligence. Handing over decision-making discretion to the platform.
What is the relationship between big data, machine learning, and artificial intelligence?
Big data is the core fuel that drives technologies like machine learning and AI. These technologies are dependent upon the advanced computational capabilities and characteristics (the 4Vs) of the underlying data.
Machine learning is a way of distilling patterns and/or achieving automation that is genuinely heuristic. Instead of writing millions of lines of code with complex rules to perform a task, you can develop technology that can look at a lot of data, recognise patterns, and learn from the data. “Learning” requires feeding huge amounts of data to the algorithms and allowing them to adjust and improve. While machine learning may be considered an evolution or extension of known statistical methods, it requires new data logistics and analytical skill in order to derive signals that are relevant to the investment process, and drive conclusions or actions in a way that delivers against the goals with precision and consistency.
Artificial intelligence (AI) is an attempt to build machines that can perform tasks that are characteristic of human intelligence. This includes understanding language, recognising objects and sounds, learning, and problem-solving. AI in financial services puts greater discretion over decision making into the technology versus the human operators. This means that AI offerings are replacing certain aspects of human labour or effort, empowering technology to perform these explicit tasks with a degree of pre-determined control.
Don’t try to replace all human involvement.
Businesses need to be mindful of where to apply machine learning and other new data processing technologies. Machine learning is the first step in the integration of big data analytics with automation. This is where technology utilises big data to learn and respond or adapt. Machine learning still allows for human insights and judgment to drive it forward—it allows you to use automation to recognise patterns, or remove bias, in a way that is faster than what a human could do.
Using artificial intelligence means you are asking the technology to make decisions on your behalf. This level of discretion is something that must be designed and weighed carefully. It’s analogous to the difference between using a GPS guidance tool versus a self-driving car.
At the end of the day, machine learning and other data processing technologies should not seek to replace the experience, judgment, and insight of the human trader, but rather they should amplify his/her capabilities and complement his/her intuition.