Data is a critical asset for state and local government, and it has been for decades. The questions today are what happens when you have too much data, and how do you make sense of it when, according McKinsey Global Institute, data volume is growing 40 percent per year? How can you keep up with that much data? It’s important in today’s digital age to not only store large data sets, but also use the data to make mission-critical decisions. This emerging field of data analytics is being called Big Data.
An enormous amount of data is generated as a result of factors such as:
• mobility: mobile devices, mobile events, and sensory integration;
• data access and consumption: Internet, sensors/actuates, interconnected systems, social networking, convergent interfaces and access models (Internet, search and social networking, and messaging); and
• information model and open source: major changes in the information processing model and the availability of an open source framework.
State and local governments must also look at the type and source of data being collected, stored, analyzed, and consumed; that is, structured versus unstructured data. Unstructured data is information that either doesn’t have a predefined data model or doesn’t fit well into relational tables — examples are text, log files, video, audio, and network-type data sets. Structured data is the data that has been modeled and normalized to fit into a relational model, such as traditional row/column databases. Big Data is a compilation of both structured and unstructured data, typically including the following data sources:
• traditional enterprise data: enterprise information data stores — from customer relationship management (CRM) and enterprise resource planning (ERP) systems to payroll and Web store transactions;
• social web data: customer feedback streams, microblogging sites like Twitter and social media platforms such as Facebook
Three key characteristics define Big Data. They are known as the three “Vs”:
• volume: machine-generated data is produced in much larger quantities than nontraditional data;
• velocity: how quickly data moves across an enterprise; and
• variety: not just relational data stores, but also the unstructured data in enterprise
In addition to cost and complexity, the requirements of traditional enterprise data models for application, database, and storage resources have grown over the years. This rapid change has prompted a shift in the fundamental models that describe the way that Big Data is stored, analyzed and consumed. The new models are built upon scaled-out, “shared-nothing” architecture, which is bringing new challenges to governments that are deciding what new technologies to use, and where and how to use them. To manage this shift, two building blocks are being added to the enterprise technology stack to accommodate Big Data:
• Hadoop: provides storage capability through a distributed, shared-nothing file system and analysis capability through MapReduce; and
• NoSQL: provides the capability in real time to capture, read and update the large influx of unstructured data and data without schemas.
Given these details, what can state and local governments do to prepare and embrace Big Data? First, state and local governments should try to get ahead of their data deluge. Strategy and planning is critical to this process. Second, they must develop and review the life cycle for Big Data in their enterprises. The life cycle can be categorized into the following phases:
• capture: the collection of data from a diverse set of sources, as described previously;
• store: the repository for the collected data — the right kind of data needs to be stored in the correct repository;
• analyze: the analytics of the data in the repositories; and
• consume: the reporting and business intelligence for decision-making.
When the Big Data life cycle is well understood, then plan and identify the following:
• Find technology enablers: These could be new infrastructure, software applications evaluation and pilots.
• Adopt an ecosystems approach: Big Data is a new and emerging space, and there will be several upcoming technology options to review and select.
• Adopt a use case-based approach: Data’s value depends on the insight of the domain. Hence, look for use case-specific projects — for example, use cases of network-centric Big Data analytics or cybersecurity and video-based insights.
• Invest in data-centric skill sets: The insights in these large data sets is as good as the domain knowledge of the data. Therefore, skills for data analysts and scientists need to be developed and nurtured.
Kapil Bakshi is a native of the Washington, D.C., area, and holds bachelor’s degrees in electrical engineering and computer science from the University of Maryland, College Park; a master’s degree in computer engineering from Johns Hopkins University and an MBA from the University of Maryland, College Park. Bakshi has held several positions within the IT industry, including at Cisco, Sun Microsystems and Hewlett-Packard.