Business Intelligence
Business intelligence infrastructure
E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers
consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions
tools for obtaining useful information from all the different types of data used by businesses today, including semi- structured and unstructured big data in vast quantities
Contemporary tools
Analytical platforms
Ightly integrated database, server, and storage components that handle complex analytic queries 10 to 100 times faster than traditional systems
Analytical information based on current data records
High-speed platforms using both relational and non-relational tools optimized for large datasets
In-memory computing
Requires optimized hardware
Can reduce hours/days of processing to seconds
Uses computers main memory (RAM) for data storage to avoid delays in retrieving data from disk storage
Used in big data analysis
Hadoop
Used by Facebook, Yahoo, NextBio
Key services
Hbase: NoSQL database
MapReduce: breaks data into clusters for work
Hadoop Distributed File System (HDFS): data storage
Enables distributed parallel processing of big data across inexpensive computers
Data marts
It is a subset of data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specified function or group of users
A data mart represents the specific data from a data warehouse which a user needs
The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team.
Data Warehouses
Ability to model and remodel the data
Improved and easy accessibility to information
Consolidates data for management analysis and decision making
Stores current and historical data
Supports reporting and query tools
It is a database that stores current and historical data of potential interest to decision makers throughout the company
A data warehouse is a collection of data drawn from other databases used by the business
A data warehouse is a large store of data accumulated from a wide range of sources within a company and used to guide management decisions
Analytical tools: relationships, patterns, trends
Web mining
Web usage mining
Mines user interaction data recorded by Web server
Web structure mining
Analyzes links to and from Web page
Web content mining
Mines content of Web pages
Discovery and analysis of useful patterns and information from Web
Evaluate effectiveness of Web site, and so on
Understand customer behavior
Text mining
Sentiment analysis software
Mines e-mails, blogs, social media to detect opinions
Extracts key elements from large unstructured data sets
Service reports, and so on
Patent descriptions
Legal cases
Call center transcripts
Stored e-mails
Data mining
Types of information obtainable from data mining
Sequences
Forecasting
Clustering
Classification
Associations
E.g., Finding patterns in customer data for one-to-one marketing campaigns or to identify profitable customers
Finds hidden patterns, relationships in large databases and infers rules to predict future behavior
More discovery driven than OLAP
Online analytical processing (OLAP)
OLAP enables rapid, online answers to ad hoc queries
Supports multidimensional data analysis
A company would use either a specialized multidimensional database or a tool that creates multidimensional views of data in relational databases
Each aspect of information (product, pricing, cost, region, time period) is different dimension
Viewing data using multiple dimensions
THE CHALLENGE OF BIG DATA
To derive business value from these data, organizations need new technologies and tools capable of managing and analyzing non- traditional data along with their traditional enterprise data
Businesses are interested in big data because they can reveal more patterns and interesting anomalies than smaller data sets, with the potential to provide new insights into customer behavior, weather patterns, financial market activity, or other phenomena
Billions to trillions of records, all from different sources
Beyond the ability of typical DBMS to capture, store, and analyze