News

Cray Unveils Ambidextrous Appliance for Data Analytics

None
May 24, 2016

By: Michael Feldman

Supercomputer-maker Cray has introduced Urika-GX, the company’s newest version of its enterprise-focused data analytics product line. With an emphasis on agility, the system merges the functionality of the existing Urika-GD and Urika-XA appliances, which provide platforms for graph-based and Spark/Hadoop-based analytics, respectively. Urika-GX wraps both of these capabilites into a single box, and does so largely with standard hardware and software.

The multifaceted nature of the new platform means that this is the end of the line for Urika-GD and Urika-XA. Both products will continue to be offered by Cray for some time, but eventually they will be phased out. The custom-built Threadstorm processor, which was used in the GD’s graph accelerator nodes, will also be discontinued, as will the multithreaded kernel OS it was run under. In retrospect, the evolution of Urika to a more standard, open platform was inevitable, given that Cray has been methodically shedding its proprietary technologies.

From the hardware perspective, Urika-GX looks remarkably like a vanilla x86 cluster encased in a standard-issue 19-inch, 42U rack. Each node is a typical dual-socket affair, powered by Intel’s latest Xeon “Broadwell” (E5-2600 v4) processors. Being a data analytics machine, there is plenty of memory and flash storage capacity to be had; a fully outfitted rack with 48 nodes can be equipped with up to 22 TB of DRAM and 35 terabytes of PCIe-based SSD storage. The idea here is to have adequate space for large graphs or other big data structures that can reside close to the processors for in-memory processing.

The only non-standard componentry in Urika-GX is the Aries interconnect, representing Cray’s first attempt in incorporating its high performance fabric into what amounts to a standard cluster architecture. And given the propensity of Cray to jettison non-standard technologies, it would be reasonable to assume that someday Aries will be replaced by its Intel offshoot, Omni-Path, once the chipmaker integrates that fabric technology with its Xeon processors.

For today though, the high-bandwidth, low latency Aries interconnect is critical when it comes to in-memory processing, enabling fast access to data that is distributed across many nodes. For large graphs, the Cray Graph Engine does the necessary work to make these multiple memory spaces act as one, but theoretically any application with large multi-terabyte data structures could use Aries to its advantage.

With Aries as the interconnect glue, scalability comes naturally. While a single rack can house 48 nodes, there are 16-node and 32-node configurations as well for those with lesser needs. If a larger system is desired, multiple racks can be hooked together, and given Aries’ performance, one is likely run to out of money long before any scalability concerns are encountered.

External storage is also well accounted for. Up to 192 TB of local disk can be deployed in a single rack, along with the option to hook the Urika-GX to a Cray Sonexion storage subsystem. Most analytics customers are likely to use their own storage setups, but if the extra performance of a Lustre parallel file system is required, Sonexion is a convenient alternative.

Software-wise, the Urika-GX is chockful of tools and frameworks oriented to the enterprise analytics crowd. Besides the aforementioned Cray Graph Engine, the system also includes Spark/Hadoop components and Kafka (a high performance messaging/data ingestion package), as well as Java, Scala, Python, and other standard tools used by data wonks.

Mesos is included as the open source cluster manager, which will dynamically allocate resources so that different analytics workloads can be run simultaneously. It is central to the agility of the Urika-GX, according to Ryan Waite Cray’s senior vice president of products. “One part of the system can be used for Kafka for real-time data ingestion, another part of the system for Spark jobs, and a third part for running the Cray Graph Engine,” Waite told TOP500 News.

Waite said all the software is pre-installed and pre-configured. No tedious configuration recipe is needed for setup and tuning. In addition, regular updates are provided for all tools and packages, with new features to be added over time.

For the platform’s systems management facility, Cray opted for Openstack, a popular cloud-based open source framework. Urika-GX is the first Cray product to come pre-packaged with Openstack, but won’t be the last according to Waite, who explained that the company’s entire portfolio will include Openstack integration in the future. “We’re very bullish on it,” he said.

Nearly all the software packaged with Urika-GX is open source or adheres to open standards. Even the Cray Graph Engine, an in-house invention, uses World Wide Web Consortium (W3C) standards like the Resource Description Framework (RDF), a standard model for data interchange, and SPARQL, an RDF query language for databases.

A number of beta customers who pre-ordered the new Urika are already putting the machine through its paces. Waite noted that an unnamed healthcare customer is currently using it to help identify potential malicious behavior of individuals trying to get into their datacenters. It has to comb through logs recording thousands of system accesses and figure out which ones appear suspicious. Another customer in the medical field is using it to predict cancer survivability based on reams of clinical data. A more research-oriented use case is at the Broad Institute of MIT and Harvard, where scientists are using the GX system to analyze high-throughput genome sequencing data.

Waite said there is general interest in the new platform by customers in life sciences, cybersecurity, and healthcare. General availability of the Urika-GX is scheduled for the third quarter of 2016.