Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
ExecutiveWhitePaper
1.
Executive White Paper
Flow
Enterprise Data-Automation Framework and Generic-Hypercube Database
Data + Actions = Results
2.
Executive White Paper
Harness All Data
Executive Briefing
The most overwhelming issue facing IT leaders is how to integrate mass influxes of data
from a variety of disconnected sources. Due to the disparate nature of data formats and
structures, incoming and existing data is siloed, preventing a synchronized view for the entirety of
an organization's data. This prohibits a complete analysis, limiting it’s potential value.
Even after the integration of data into one consolidated stream of information, a new
problem arises: how to efficiently analyze the collection to produce results that matter.
This typically involves database managers specializing in a query language such as SQL,
data scientists specializing in a data analytics language such as R, and algorithms experts
specializing in a massively distributed parallel processing paradigm such as MapReduce. This
process creates an extreme amount of overhead that stands between raw data and insightful
results.
Current solutions are crafted by patching together many disconnected components in an
attempt to facilitate this complete big data architecture; but the development process is long and
resource intensive. At 4DIQ, we have architected an answer to these challenges.
Flow is the Most Complete and Simple Data Science Architecture
Based on the concept Generic Data and Generic HyperCubes, Flow is the first true generic
HyperCube data container...
Backed by a groundbreaking parallel processing architecture, Flow is one of the most
efficient computation engines on the market.
By abstracting traditional coding methods and providing a simple interface to automate
virtually any procedure across the entirety of your data, Flow delivers a breakthrough in data
science.
Simply put...
...the technology behind Flow can automate virtually any data science task...
...irrespective of scale or complexity
2
3.
Executive White Paper
What Does Flow Do?
Flow is a universal ‘system of systems’ that provides streamlined, automated layers of
communication logic between any number of any variety of disconnected data systems/applications,
at the highest scale possible.
By using Flow to leverage the numerous sources of data available (both internally and
externally), an enterprise can gain significant competitive advantage.
The Motto
Data + Actions = Results
Universalize data and simplify actions to create meaningful, automated results.
The Objective
To automate any data science task irrespective of scale or complexity.
The Flow Ecosystem
The Flow ecosystem is derived from the core premises that compose the entire process, A-Z,
of data science; the model being:
❏ database creation and cell manipulation
❏ data integration
❏ database management and cleansing
❏ the new ability to create ‘HyperCubes’
❏ data analytics
❏ large scale data processing
❏ a visualization and reporting environment
❏ a community to share and collaborate on Workflows created
The premise of Flow is to abstract away and simplify the complexities associated with data
science, etl, and machine learning (such as syntax-specific custom code), without compromising the
capabilities of programming...
Flow retains the raw power and functionality of any high-level programming language, but
without the need to custom code or script. Developing Workflows is extremely dynamic and
facilitates an unprecedented rate at which complex data science problems within an enterprise can be
solved, automated, and easily maintained.
3
4.
Executive White Paper
The Components
Generic Data
Flow contains an entirely new type of database architecture which we call a ‘generic database’’
or the ‘jagged HyperCube model’.
Generic data is a key-value data structure. The Flow system can adapt to restructure any and
every* source of data into a single, universal format that is easily manipulated and transferred. Data
persists in-memory within the generic tables. The generic data is bidirectional such that data can be
transferred effortlessly between any combination of data formats.
Data between disparate systems can essentially interact as if they were from the same source.
This includes but is not limited to:
● Local files
○ Excel
○ Access
○ Delimited
○ Positional
○ XML
○ JSON
● RDMS and custom databases
○ MySQL
○ SQL Server
○ Hyperion
○ Informix
○ Postgres
○ SAP
○ Legacy Systems
● CRMs and common applications
○ Redtail
○ Pipedrive
○ Zoho
○ Salesforce
○ Quickbooks
○ Outlook
● Web APIs
○ Twitter
○ LinkedIn
○ Facebook
○ Instagram
○ HTML
○ RSS Feeds
○ Natural Language Text
○ LDAP
○ Google Feeds
○ Magento
○ Ebay
○ Google Analytics
○ Yellow Pages
○ NY Times
○ Tap into the ‘Internet of Things’
○ Any Restful API’s
○ Any SOAP API’s’
*A custom plugin to any data source unaccounted for can be created in approximately one day
4
5.
Executive White Paper
Expression Builder
Flow provides an advanced development environment against the generic data to facilitate
the design and delivery of reusable and automatable Workflows.
The Expression Builder is a simple interface that provides a layer of abstraction over CS
languages or commands that involve:
● Datapoint Management
○ Excel
● Database Management
○ SQL
● Data Analysis
○ R/Python
The Expression Builder contains eleven complete libraries of functions + special-function
tabs. Workflows are produced by sequencing together operations via the Expression Builder in a
rapid and intuitive manner; to create higher level algorithmic procedures.
Flow presents a systematic way of feeding a function’s output as an input into a new function
to instantaneously create diverse new features and characteristics across the data. Flow provide the
building blocks required to automate virtually any data science task.
Agent Architecture
Flow is backed by a parallel processing architecture comparable to Hadoop’s distributed filing
system; allowing for the distributed deployment and processing of massive data sets in a parallel,
asynchronous manner.
Monitoring Agents are deployed to as many disconnected systems at as many locations as
necessary, executing chunks of algorithmic logic in parallel and then pooling the results together at
any desired frequency. The computation process is distributed over as many physical CPUs as
required and is orchestrated by a master Workflow Agent via cloud. Command line entries are
eliminated from this aspect of the system as well.
5
6.
Executive White Paper
HyperCube
Combine all disparate data sources together to create a massive, jagged data set. Flow’s
logarithmic-time algorithm then explodes this massive data set into a HyperCube which creates all
permutations for any number of dimensions that exist.
The HyperCube is a high dimensional structure which vectorizes and contains every possible
combination and link based on the dimensional axis of the data. The HyperCube is projected and
displayed into a two-dimensional matrix format. Each vector of the Hypercube represents an
independent aggregation of the underlying data point values; ranging from one to n dimensionality.
The HyperCube data structure is a unique and powerful tool for training advanced AI
learning procedures and optimization techniques. It scales linearly in a massively parallel fashion to
accommodate (via live communication streams) for all the data necessary as it is increasingly updated
and generated across an enterprise.
Visualization
Add Workflow steps to extract any subset or dimensional view from the HyperCube via pivot
table. Easily translate those pivot tables into a choice of graphical representation.
Metrics and KPI calculation steps can be added as well. Group all of these reporting features
up in a dashboard and push it to the cloud for a painless, portable, and autonomous reporting
experience.
6
7.
Executive White Paper
The Potential
Flow has the potential to automate and optimize virtually every data science aspect within an
enterprise. The Flow interface allows creativity and logic to completely replace coding expertise. The
scope and diversity of procedures you can design is virtually limitless.
Workflow procedures can accomplish, but are not limited to:
❖ Creating autonomous data communication streams across systems
❖ Implementing standards in data across disconnected systems
❖ Linking and relating data across disconnected systems
❖ Creating automated cleansing logic
❖ Real time monitoring of data systems for anomalies
❖ Triggering conditional based notifications
❖ Migrating and unifying Legacy systems
❖ Evaluating advanced conditions in data
❖ Scrubbing and standardizing address data
❖ Validating emails and customer names
❖ Automating reporting tasks across many systems
❖ Training advanced machine learning and predictive models
❖ Performing any type of statistical analysis
❖ Reconstructing and transforming datasets to create new features
❖ Validating and looking up city, state and zip fields against UPS database
❖ Performing fuzzy match ups and fuzzy joins
❖ Identifying and extract duplicates
❖ Delivering reports and streamline dashboard generation
❖ Performing semantic analysis and semantic matching--
❖ Implementing data dictionaries
❖ Executing FTP data transfers
❖ Finding hidden anomalies and patterns
❖ Performing optimized search across n systems
7
8.
Executive White Paper
Summary
Have one master Workflow orchestrating the executions of as many sub-Workflows across all
disconnected systems via autonomous streaming Agents.
Use the system’s join and set-operations to rapidly cleanse and unify the entirety of an
enterprise’s internal data, as well as any desired target data from external sources, into one all
encompassing jagged generic data set.
Warp this jagged generic data set into a high dimensional generic HyperCube which contains
every link and relation across every dimensional axis within your data.
Optimize and train learning algorithms across the HyperCube; extract and then group
together imperative statistical evaluators for an autonomous reporting experience.
The scope and diversity of what can be achieved in this system is bounded by the creativity
of the user.
For further inquiry, please contact:
Hypercube Artificial Intelligence Division
Andrew McLaughlin, CEO and Lead Developer
andrew.mclaughlin@4diq.com
M: 1.484.283.530
Jeremy Villar, CMO and Developer
jeremy.villar@uconn.edu
M: 1.860.309.2788
8