Most of modern data driven initiatives are impossible without using one or another Big Data technology. It’s true for Machine Learning, Drones, support of solar shingles customers, Internet of Things etc. Moreover, Big Data is glue which keep all these things together and create cumulative effect pushing costs down and multiplying business opportunities. So, as popular marketing statement formulates, data will be new “oil”.
Fortunately, hype around Big Data is almost faded, and without myriads of annoying sales people giving unrealistic promises, professionals like me can concentrate on real things.
Today, Big Data boils down to cheap reliable place where you store all important data. It can be stored in original raw format and as relational tables in Big Data Warehouse. All these tables are ready for immediate analysis and processing.
How to use the data? They can be delivered to users through reporting tools like Tableau, Cognos, Qlik, MS Power BI. Processed data can feed into Enterprise Data Warehouse and used for analytical, reporting, BI systems.
Support of Big Data infrastructure still can be challenging, but with huge improvements happened in last couple years, I consider that the technology is finally ready for implementation in roofing industry. And checking profiles on LinkedIn, I can confirm that at least one roofing company has the same opinion.
Big Data can start as simple as small POC project with cluster of 4 servers and totally free software, and can be developed to cluster(s) with dozens or even hundreds of servers. Roofing industry has its own requirements, and from my Data Architect experience there are opportunities to introduce Big Data much more efficient than it can be done in big banks or telecom companies. Here is typical console of Hadoop which is most popular Big Data technology. It is monitoring most important components of Big Data such as highly distributed file system (HDFS), analytical in-memory engine (Spark 2) and Data Analyst Workspace (Zeppelin Notebook). In addition, there are components for ETL, Workflow, Messaging, Security etc.
Even small Big Data cluster typically has dozens of terabytes storage space, in the example it is 98TB.
Speaking about roofing industry requirement, 98TB is enough to store couple years of all equipment compressed logs, and processing power of cluster is enough to analyse patterns/failures/energy consumption based on billions of measurements points.
The greatest thing about the Hadoop clusters is that they are predictably, easily and cheaply extendible. For example, building this specific cluster (4x16 CPU Cores, 4x64GB Memory, 100TB) cost less than 100k including hardware and initial configuration. Need more space/processing power? No problem, just add more servers (nodes) and system automatically distributes data and tasks between new and old nodes.
What if company needs corporate Data Lake having all possible company information including 30+ years historical info accessible? Just grow cluster to dozens or even hundred nodes. And if redundancy between data centers is a requirement, just tell Hadoop that specific servers rack is sitting in different data center; it will automatically distribute copy of data to prevent any information losing.
Today, Data Management is essential part of roofing industry routines. But without appropriate technologies, usage of it is far from its true potential of applying to business processes.
Business outcome from Big Data infrastructure comes from combination of agility of drones, power of Artificial Intelligence models, smartness of Machine Learning, granularity of IoT, precision of GPS, insights of Advanced Analytics and integration with Cloud Solutions. These innovations assembled together provide incredible opportunity to industry on the way of digital transformation.
Big Data available to every company information worker through appropriate processes and analytical tools convert data from bits and bytes on disks and tapes to modern digital oil.
In addition to infrastructure role, I’d like to provide three special use case provides immediate outcome without extra tools.
Compliance and Records Retention. To follow industry and government standards, it is mandatory to keep records up to 30 years. Big Data is the best place to do it, just because of immediate access for audit and reporting. It requires planning, integration and governance, but from project perspective it’s low hanging fruit.
Equipment Log Analysis. Sometimes engineers need to go back to some event in past to investigate problem or equipment performance. And in many cases, detailed information is not available immediately or not exist at all. Reason? Just not enough space to keep it. So, it is common situation (not only in roofing industry) when company keep only bare minimum of data for quality control. But, if you need to do some process control optimization, there can be just not enough raw data.
Cybersecurity and IT Events Logs. All firewalls, switches, servers, workstations generate logs, alerts, authentication information, patch versions, etc. Size of these data is enormous, so company needs proper storage solution here. Due to limitations, normal practice here is log rotation which prevents from going back to investigate possible breaches after rotation period. So, to avoid Security audit problems and have data for analysis available right away, it makes sense to build Big Data solution specifically for cybersecurity. And if we talk about solutions, there are several good ones on market, like Splunk and its version Hunk (Splunk analytics for Hadoop).