Business Intelligence
THE CHALLENGE OF BIG DATA
Beyond the ability of typical DBMS to capture, store, and analyze
Billions to trillions of records, all from different sources
Businesses are interested in big data because they can reveal more patterns and interesting anomalies than smaller data sets, with the potential to provide new insights into customer behavior, weather patterns, financial market activity, or other phenomena
To derive business value from these data, organizations need new technologies and tools capable of managing and analyzing non- traditional data along with their traditional enterprise data
Analytical tools: relationships, patterns, trends
Online analytical processing (OLAP)
Supports multidimensional data analysis
Viewing data using multiple dimensions
Each aspect of information (product, pricing, cost, region, time period) is different dimension
A company would use either a specialized multidimensional database or a tool that creates multidimensional views of data in relational databases
OLAP enables rapid, online answers to ad hoc queries
Data mining
More discovery driven than OLAP
Finds hidden patterns, relationships in large databases and infers rules to predict future behavior
E.g., Finding patterns in customer data for one-to-one marketing campaigns or to identify profitable customers
Types of information obtainable from data mining
Associations
Classification
Clustering
Forecasting
Sequences
Text mining
Extracts key elements from large unstructured data sets
Stored e-mails
Call center transcripts
Legal cases
Patent descriptions
Service reports, and so on
Sentiment analysis software
Mines e-mails, blogs, social media to detect opinions
Web mining
Discovery and analysis of useful patterns and information from Web
Understand customer behavior
Evaluate effectiveness of Web site, and so on
Web content mining
Mines content of Web pages
Web structure mining
Analyzes links to and from Web page
Web usage mining
Mines user interaction data recorded by Web server
Contemporary tools
Data Warehouses
A data warehouse is a large store of data accumulated from a wide range of sources within a company and used to guide management decisions
A data warehouse is a collection of data drawn from other databases used by the business
It is a database that stores current and historical data of potential interest to decision makers throughout the company
Supports reporting and query tools
Stores current and historical data
Consolidates data for management analysis and decision making
Improved and easy accessibility to information
Ability to model and remodel the data
Data marts
The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team.
A data mart represents the specific data from a data warehouse which a user needs
It is a subset of data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specified function or group of users
Hadoop
Enables distributed parallel processing of big data across inexpensive computers
Key services
Hadoop Distributed File System (HDFS): data storage
MapReduce: breaks data into clusters for work
Hbase: NoSQL database
Used by Facebook, Yahoo, NextBio
In-memory computing
Used in big data analysis
Uses computers main memory (RAM) for data storage to avoid delays in retrieving data from disk storage
Can reduce hours/days of processing to seconds
Requires optimized hardware
Analytical platforms
High-speed platforms using both relational and non-relational tools optimized for large datasets
Analytical information based on current data records
Ightly integrated database, server, and storage components that handle complex analytic queries 10 to 100 times faster than traditional systems
Business intelligence infrastructure
tools for obtaining useful information from all the different types of data used by businesses today, including semi- structured and unstructured big data in vast quantities
consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions
E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers