Security data collection, processing, and analysis has exploded over the past five years. In fact, recent ESG research into security analytics found 28% of organizations claim they were collecting, processing, and analyzing significantly more security data than they did two years ago, while another 49% were collecting, processing, and analyzing somewhat more data during the same timeframe (note: I am an ESG employee).
What type of data? You name it. Network metadata, endpoint activity data, threat intelligence, DNS/DHCP, business application data, etc. Additionally, let’s not forget the onslaught of security data from IaaS, PaaS, and SaaS.
Ramifications of massive security data growth
The massive growth in security data has led to many ramifications, including the following:
- The need for better security data modeling and management. According to SAS software, about 80% of the time spent on data analytics is dedicated to data modeling and management. As data volumes grow for cybersecurity, I’ve noticed a trend in this direction. Organizations are spending more time determining what data to collect, what data formats are needed, where and how to route that data, data de-duplication, data compression, data encryption, data storage, etc.
Based on the growing need for data management, ESG’s security operations and analytics platform architecture (SOAPA) is anchored by a common distributed data management layer meant to provide these types of data management services for all security data. Since most organizations are easing into SOAPA, they should think through their security analytics data model early on. In simple terms, think about what you want to accomplish and then work back to the data sources needed.
- The quest for data synthesis, enrichment, and contextualization. All security data elements can be related to one another, but this is easier said than done. In the past, many organizations relied on the security staff and spreadsheets to correlate security incidents and alerts generated by different analytics tools. When network traffic analysis (NTA) tools detected suspicious traffic, analysts grabbed source IP addresses, investigated DHCP servers for IP lease history, figured out which device was involved, and then dug into historical log files emanating from this device.
Given the inefficiencies of these manual tasks, we’ve seen an increase in point-to-point analytics tools integration and a greater desire for architectural integration a la SOAPA. Behavioral analytics such as User and Entity Behavior Analytics (UEBA) show some data synthesis promise by pumping multiple simultaneous security data events through a series of nested machine learning (ML) algorithms. Yes, behavioral analytics are a bit of a work in progress, but I am encouraged by the recent innovation and advancement I’ve seen.
- High-performance requirements. Large organizations are monitoring tens of thousands of systems, generating upwards of 20,000 events per second, and collecting terabytes of data each day. This data volume calls for an efficient data pipeline and the right network, server, and storage infrastructure to move, process, and analyze this data in real time. To address real-time data pipelining needs, I’ve seen broad adoption of the Kafka messaging bus. Oh, and let’s not forget that we need ample horsepower to query terabytes to petabytes of historical security data for incident response and retrospective investigations. This need is leading to the proliferation of security data lakes based upon open source (i.e. ELK stack, Hadoop, etc.) and commercial offerings.
- AI, aye, aye, aye. The good news: All of this data provides ample opportunity for data scientists to create and test data models, develop ML algorithms, and tune them for high accuracy. The bad news is that we are just beginning to amalgamate data scientists and security expertise to develop AI for security analytics. Progressive CISOs have a realistic attitude. Their hope is that AI/ML can improve the fidelity of individual security alerts by providing more background evidence, adding risk scoring context, etc. In other words, AI/ML acts as an intelligent layer of defense rather than a stand-alone omniscient security analytics deity.
- Cloud-based security analytics. Not surprisingly, many organizations are questioning the wisdom of dedicating vast resources just to collect, process, and store terabytes or even petabytes of security data as a prerequisite for modern security data analytics needs. Wouldn’t it be easier to use massive and scalable cloud-based resources for this purpose? Based upon my market observations, the answer is a resounding “yes.” IBM and Splunk report strong growth in their cloud-based SIEMs. SumoLogic claims to have over 2,000 customers, while Google (Chronicle Backstory) and Microsoft (Azure Sentinel) are the new shiny objects in cloud-based security analytics. Look for Amazon to jump into the pool, as well. With unabating security data growth, the “lift and shift” of security analytics to the cloud will only gain momentum.
Famed technology author Geoffrey Moore is quoted as saying: “Without big data analytics, companies are blind and deaf, wandering out onto the web like a deer on a freeway.” While Moore was talking about the early days of the web, this quote applies equally to security analytics. Yes, organizations can vastly improve their ability to mitigate risk and detect/respond to risks and automate security operations with strong security analytics. To achieve these outcomes, however, CISOs must put adequate planning and work into security data modeling, pipelining, and management from the get-go.