@richardmcdougll

Richard McDougall

Cloud Applications Architecture

Biography

Richard McDougall is the Application Infrastructure CTO and Principal Engineer in the Office of the CTO at VMware. He is responsible for driving advanced development and strategy for VMware’s application platform architecture – including the performance and integration of applications, runtimes, middleware, and application encapsulation technologies. Richard’s is known as an expert in the areas of performance measurement and optimization, and in application deployment architectures.

Before the CTO office, as the Chief Performance architect Richard drove the performance strategy and initiatives to enable virtualization of high-end mission critical applications on VMware products.

Prior to joining VMware, Richard was a Distinguished Engineer at Sun Microsystems. During his 14 years at Sun, he was responsible for driving high performance and scalability initiatives for Solaris and key applications on the Sun platform. He served on the central software platform architecture review committee, and also drove the early resource management initiatives for Solaris. Recognized as an operating system and performance expert, he developed several technologies for the Solaris operating system and co-authored several books—including “Solaris Resource Management”, “Solaris Internals” and “Solaris Performance and Tools”.

Richard holds several patents in the area of performance instrumentation, algorithms and distributed file system technologies.

Posts by Richard McDougall

@richardmcdougll

Pivotal, Big Data and VMware

April 24, 2013
By
Pivotal, Big Data and VMware

  It’s great to see the public launch of Pivotal today. The mission — to build a new platform for a new era — is bold but appropriately targeted at some of the biggest fundamental changes in application technologies. Pivotal is now a separate entity, bringing several teams and technologies from both VMware and EMC — including Greenplum’s Hadoop (now Pivotal HD), Greenplum Database (fused with Hadoop as a new database known as HAWQ), CETAS, Pivotal Labs, Gemfire in-memory database, the Spring Application Framework and the Cloud Foundry PaaS platform. The goal of the platform is to enable the new wave of predictive big data applications — those which pull in vast quantities and sources of data — including high rate real...

Read more

@richardmcdougll

Expanding the Virtual Big Data Platform

April 2, 2013
By
Expanding the Virtual Big Data Platform

Today we are releasing a new set of capabilities in  Serengeti  0.8.0, which extends the reach of partner supported Hadoop versions and capabilities. In addition, we are broadening the reach of Serengeti into mixed workload configurations, enabling provisioning of an HBase cluster in this release. As I’ve discussed in previous posts , most big-data environments consist of a mix of workloads. Serengeti’s mission is to enable as many of the big-data family of workloads into the same theme park, all running on a common shared platform. Supporting mixed workloads is a key capability for big-data. In my customer discussions I see a mix of Map-Reduce, HBase, Solr, numerical analysis (R and SAS), and increasingly more of the Big SQL engines such...

Read more

@richardmcdougll

2013 Predictions for Big Data

December 18, 2012
By
2013 Predictions for Big Data

Over the last few years we’ve seen a frenzy of interest and buzz around the area of Big Data. Beyond the hype, there is a solid base of growing use cases, which are becoming center stage to most businesses. 2011 was the year of awareness. There was a great amount of sharing from the early core developers of the analytic platforms – showing the rest of the world the capabilities of the tools and platforms that had been developed for special purpose high scale analytics. The big names at the core of open source analytics development include Facebook, eBay, Linkedin, Twitter – all blazing the trail with new approaches. These companies brought along with them a new and expanding interest in leveraging the same...

Read more

@richardmcdougll

Our Elephant Grows Up – New Serengeti Capabilities for Hadoop

October 23, 2012
By
Our Elephant Grows Up – New Serengeti Capabilities for Hadoop

  Since the release of Serengeti, VMware has learned a tremendous amount from our customers about using virtualization as the platform for big data workloads and Hadoop. These customer conversations provided us with solid reasons to virtualize Hadoop and other big data workloads. Today, we’re introducing the third significant release of project Serengeti. In this blog I cover some of the new headlines of the work accomplished since the Hadoop Summit in June, which include: Support for Dynamic Elastic Scaling Hive JDBC connections Data upload/download interface Ability to configure infrastructure topologies Placement controls for Hadoop nodes on Physical Hadoop tuning configurables A community contributed UI At this October release , we’re also supporting Cloudera CDH3u3, Greenplum GPHD 1.2.0.0, Hortonworks HW 1.0.7, and Apache 1.0.1. The...

Read more

@richardmcdougll

Towards an Elastic Elephant: Enabling Hadoop for the Cloud

October 22, 2012
By
Towards an Elastic Elephant: Enabling Hadoop for the Cloud

by guest blogger, Tariq Magdon-Ismail (@tariqmi) In his joint presentation at Hadoop Summit 2012 titled “Hadoop in Virtual Machines” , Richard McDougall talked about  the benefits and challenges of virtualizing Hadoop. In particular, he introduced the idea of separating Hadoop’s compute runtime from data storage  on virtual infrastructure and touched on why this architecture is both desirable and feasible in a cloud environment. In this blog post I hope to examine this topic a bit further and present some initial performance results.     The Evolution of Virtual Hadoop The common approach to virtualizing many applications is to perform a P2V (physical-to-virtual) migration where the physical deployment is directly cloned into virtual machines. Hadoop is no different. While this is a...

Read more

@richardmcdougll

Big Data and Virtual Hadoop at VMworld 2012

October 1, 2012
By

VMworld 2012  has come and gone, and VMworld Europe is on the horizon. We had several big data oriented sessions this year, and saw a significant rise in the activity in this important area. During the keynote, we demo’ed the next version of Serengeti , which allows Hadoop to be elastically scaled on a virtual platform. We showed a scenario of mixed workloads on the same platform, allowing time-based shifts in the amount of resources assigned to Hadoop and other workloads. You can see the highlights from the keynote here , and additional information on compute/data separation in an up coming blog. In addition, Jeff Buell and I presented on the work VMware is doing to make virtualization the best place for...

Read more

@richardmcdougll

Project Serengeti: There’s a Virtual Elephant in my Datacenter

June 12, 2012
By
Project Serengeti: There’s a Virtual Elephant in my Datacenter

Introduction There’s no question that the amount of value being extracted from data is increasing – almost every customer I speak with is building new technology to gain new or competitive insights from tapping large volumes or rates of data. In the last few posts, I have introduced VMware technologies and products that provide data services to new applications. We see four major axes along which data requirements are stretching the limits of traditional approaches to data analysis: Big Data – The need to store and compute against hundreds of gigabytes of unstructured or semi-structured data Fast Data – The increasing need for low latency interactions with large sets of data, often driven by today’s mobile and social apps. Flexible Data – The need...

Read more

@richardmcdougll

Cetas (VMware) receives prestigious 2012 TiE50 award!

May 18, 2012
By
Cetas (VMware) receives prestigious 2012 TiE50 award!

  As a great validation of VMware’s decision to acquire Cetas , that team has been honored with the TiE50 award in the Software category. This award acknowledges their achievements and underscores the recognition we are getting in the Big Data Analytics space. We are thrilled to receive this prestigious award and it is a solid recognition of all the innovative and hard work done by the team. TiE is a leading entrepreneur-focused organization doing a tremendous job of fostering creativity and innovation amongst entrepreneurs while identifying the technologies and companies that rise to the top in their respective focus areas. More and more business users are demanding instant...

Read more

@richardmcdougll

Analyzing Hadoop’s internals with Analytics

May 10, 2012
By
Analyzing Hadoop’s internals with Analytics

As part of our Big Data efforts, we have a team focused on Hadoop that is working hard to ensure Hadoop runs well on vSphere. We published a paper last year on Hadoop performance, and have a lot more in the pipeline. More recently, I took up a challenge to see how much we could learn about Hadoop I/O in a very short time, using our dynamic tracing framework. The results were quite interesting. To ensure I position this work correctly, it’s really a work-in-progress study that I’d like to engage a discussion around.  It’s what we’ve learned by digging into the architecture, to help us make engineering decisions. I hope it’s helpful, and really want to get your feedback, so we can steer...

Read more

@richardmcdougll

VMware acquires Cetas Software for Cloud and Big Data Analytics

April 24, 2012
By
VMware acquires Cetas Software for Cloud and Big Data Analytics

At the beginning of the year, I posted that we are in the midst of a data renaissance   — where a rapid proliferation of new uses of data is driving new technologies to manage data. The traditional relational database had once been the main vehicle for serving the needs of online applications with optimized variants used to store longer-term data for business intelligence and analytics. No longer however, does one size fit all when it comes to data and database technologies. The unprecedented volume of information being collected and analyzed is driving new Big Data technologies. Extreme performance requirements and changing hardware architectures necessitate new Fast Data solutions, like in-memory databases. Applications that interact with data in a variety...

Read more