HPCC Systems From LexisNexis®
Open Source, Big Data Processing and Analytics
2012 Summer Developer Newsletter

The Summer Developer Newsletter is a quarterly newsletter from HPCC Systems for the Developer Community.

In This Issue
  Developer Update
  Use Case Examples
  HPCC Systems in the News
  Platform Updates
  Tradeshows and Events
  Training Resources
  About HPCC Systems

Developer Update

HPCC Systems Integration with Hadoop
Download the Connector and Learn More | Download Now

In May 2012, HPCC Systems released a beta version of its Hadoop Data Integration connector which provides to seamlessly access data stored in a HDFS (Hadoop Distributed File System) from within the Thor component of the HPCC System platform. It also allows developers to write to HDFS from within Thor.

What are the benefits?
This new feature enables several opportunities to leverage HPCC Systems components from within an existing Hadoop cluster.

•  Plug the Roxie real-time distributed data analytics and delivery system, providing real time access to complex data queries and analytics, to data processed in a Hadoop cluster.

•  Leverage the distributed machine learning and linear algebra libraries that the HPCC platform offers through its ECL-ML (ECL Machine Learning) module. For a highly efficient and reliable data workflow processing system, take advantage of the HPCC Systems platform and ECL.

•  Combine it with Pentaho Kettle/Spoon to add a graphical interface to ETL and data integration.

Using the Hadoop Data Integration connector is simple. The connector has been packaged to include all the necessary components, which are to be deployed to every HPCC node. HPCC Systems can coexist with Hadoop, or run on a different set of nodes, which is normally recommended for performance reasons.

Use Case Examples

Case Study: Sentiment Analysis with Engauge and Pinterest
Download the Pinterest Case Study | Download Now

Earlier this year, the HPCC Systems team worked with our partner, Engauge, to run sentiment analysis on one of the hottest social media sites, Pinterest. The need to explore usage patterns, sift through and analyze all this big data required technology which the HPCC Systems platform and its programming language Enterprise Control Language (ECL) and its Machine Learning (ML) libraries could do effortlessly. Once the data was cleaned and normalized, ECL was used to process the data for analysis along with several built-in machine learning methods, including the agglomerative hierarchical clustering algorithm, for generating the results.

Getting a project like this started can be tricky as there are multiple aspects to cover, including data collection, discovery, cleansing, feature extraction, linking and analysis. Fortunately, the HPCC Systems platform makes the entire data workflow seamless, as describing and coding each of these components in ECL is both, simple and concise.

A recent HPCC Systems blog post provides a good introduction to sentiment analysis and text classification. Click here to read the blog.

Click here to learn more about the Machine Learning libraries available from HPCC Systems.

We are here to help with your social media analysis! Let us know if you have any questions or contact us: info@hpccsystems.com.

Case Study: Finding Fraud in Medicaid Programs for a Large Government Agency

Read the InformationWeek Media Article | Click Here

The Office of Medicaid Inspector General (OMIG) of a large northeastern state suspected fraud among a group of state Medicaid recipients, all of whom were living in the same high-end, ocean-front condominium complex – and all of whom were on Medicaid. LexisNexis® was charged with identifying the hidden relationships between the million-dollar condo-dwellers and their assets, providers, medical facilities and providing care to the state’s Medicaid recipients. Unfortunately, no commercial or government health care organization has the time or manpower to investigate numerous new, separate cases, much less uncover connections the individuals may have attempted to keep under wraps. To accomplish this, LexisNexis integrated data and investigative resources from the OMIG with its open source, big data, high-performance computing platform called HPCC Systems.

To initiate the investigation, LexisNexis was given the list of names and addresses of the targeted group – and nothing more. The results, however, were far more expansive. Leveraging 50 terabytes of public data, LexisNexis built a large-scale network map of the targeted Medicaid recipients and everyone associated within two degrees. Next, patented LexisNexis algorithms were used to cluster the network map and generate statistics to measure every cluster. The graph for the cluster was then queried for significant statistic to measure every cluster.

The resulting connections enabled LexisNexis to quickly discover the key players in the suspected ring. The analyses revealed hundreds of high-end automobiles, other properties owned and links to provider networks. It also revealed very suspicious volumes of “deed flipping” within the group, potentially indicative of mortgage fraud and money laundering. The investigation is ongoing, and this state’s OMIG leads the nation in current techniques using provider behavior analysis to target interventions. LexisNexis and the HPCC Systems team continue to push the boundaries with analytics examining up to 20 billion data points to create variables that allow for predictive analysis incorporating relationship context and associated risk.

HPCC Systems in the News

Leading Analyst Firm Names HPCC Systems as a Cool Vendor
Link to the Gartner Report | Download Now

HPCC Systems has been named as a “Cool Vendor” in the April 2012 Gartner report: "Cool Vendors in Information Infrastructure and Big Data, 2012," authored by Merv Adrian, Donald Feinberg and W. Roy Schulte. Gartner, the world’s leading information technology research and advisory company defines a cool vendor as a company that offers technologies or solutions that are innovative, impactful, and intriguing.

PCWorld Article
Five Things CIOs Should Know | Download Now

Featured Blog
Promising Big Data Crunching Alternatives | Download Now

About HPCC Systems

HPCC Systems is an open-source, enterprise-proven Big Data analytics processing platform to manage, sort, link, join and analyze billions of records for enterprise customers who need to process large volumes of data in critical 24/7 environments. It evolved from the need of LexisNexis® to manage its own big data challenges to serve customers. Read about our 1-year anniversary.

Stay Connected

Subscribe to the Newsletter

We encourage you to contact us to discuss any items included in this newsletter or to learn more about how we can help you solve your big data challenges.

HPCC Systems

Follow Us

LexisNexis on TwitterLexisNexis on FacebookLexisNexis Network on LinkedInLexisNexis Network on YouTube

From inside the US and Canada: +1.877.316.9669
From outside the US and Canada: +1.678.694.2200
Email: info@hpccsystems.com

Platform Updates

Latest Community Edition 3.8.0 Version Available | Download Now
This version includes the new ECL Playground which allows developers to access and execute self-contained ECL code on the HPCC system without the use of any other tools. New Memory management provides significant performance improvements when sorting variable-length records. Also an updated set of Client Tools includes compatibility for Mac.

ECL Data Integration Plugins for Pentaho | Download Now
HPCC Systems has released a set of plugins for Pentaho Data Integration to make Big Data development as easy as drag and drop.

We also want to congratulation Pentaho for their debut on the Gartner Magic Quadrant for Business Intelligence Platforms. Click here to link to the report.

Tradeshows and Events

Right Around the Corner
HPCC Systems will be attending the several events in the upcoming months.

OSCON Portland: July 16-20, 2012
Big Data World Europe: September 19-20, 2012
O'Reily Strata - New York: October 25-26, 2012

Training Resources

Video Tutorials
HPCC Systems offers Free Online Video Tutorials | Watch Now

HPCC Systems offers free online video tutorials to get you started quickly. A wide range of topics are covered for beginners trying to solve their first problem to advanced users looking to tune their programs to meet performance requirements.

Prefer classroom style training? Click here for the latest training schedule.