Richard McDougall
Cloud Applications Architecture
Biography
Richard McDougall is the Application Infrastructure CTO and Principal Engineer in the Office of the CTO at VMware. He is responsible for driving advanced development and strategy for VMware’s application platform architecture – including the performance and integration of applications, runtimes, middleware, and application encapsulation technologies. Richard’s is known as an expert in the areas of performance measurement and optimization, and in application deployment architectures.
Before the CTO office, as the Chief Performance architect Richard drove the performance strategy and initiatives to enable virtualization of high-end mission critical applications on VMware products.
Prior to joining VMware, Richard was a Distinguished Engineer at Sun Microsystems. During his 14 years at Sun, he was responsible for driving high performance and scalability initiatives for Solaris and key applications on the Sun platform. He served on the central software platform architecture review committee, and also drove the early resource management initiatives for Solaris. Recognized as an operating system and performance expert, he developed several technologies for the Solaris operating system and co-authored several books—including “Solaris Resource Management”, “Solaris Internals” and “Solaris Performance and Tools”.
Richard holds several patents in the area of performance instrumentation, algorithms and distributed file system technologies.
Posts by Richard McDougall
It’s great to see the public launch of Pivotal today. The mission — to build a new platform for a new era — is bold but appropriately targeted at some of the biggest fundamental changes in application technologies. Pivotal is now a separate entity, bringing several teams and technologies from both VMware and EMC — including Greenplum’s Hadoop (now Pivotal HD), Greenplum Database (fused with Hadoop as a new database known as HAWQ), CETAS, Pivotal Labs, Gemfire in-memory database, the Spring Application Framework and the Cloud Foundry PaaS platform. The goal of the platform is to enable the new wave of predictive big data applications — those which pull in vast quantities and sources of data — including high rate real...
Read more
Today we are releasing a new set of capabilities in Serengeti 0.8.0, which extends the reach of partner supported Hadoop versions and capabilities. In addition, we are broadening the reach of Serengeti into mixed workload configurations, enabling provisioning of an HBase cluster in this release. As I’ve discussed in previous posts , most big-data environments consist of a mix of workloads. Serengeti’s mission is to enable as many of the big-data family of workloads into the same theme park, all running on a common shared platform. Supporting mixed workloads is a key capability for big-data. In my customer discussions I see a mix of Map-Reduce, HBase, Solr, numerical analysis (R and SAS), and increasingly more of the Big SQL engines such...
Read more
Over the last few years we’ve seen a frenzy of interest and buzz around the area of Big Data. Beyond the hype, there is a solid base of growing use cases, which are becoming center stage to most businesses. 2011 was the year of awareness. There was a great amount of sharing from the early core developers of the analytic platforms – showing the rest of the world the capabilities of the tools and platforms that had been developed for special purpose high scale analytics. The big names at the core of open source analytics development include Facebook, eBay, Linkedin, Twitter – all blazing the trail with new approaches. These companies brought along with them a new and expanding interest in leveraging the same...
Read more
Since the release of Serengeti, VMware has learned a tremendous amount from our customers about using virtualization as the platform for big data workloads and Hadoop. These customer conversations provided us with solid reasons to virtualize Hadoop and other big data workloads. Today, we’re introducing the third significant release of project Serengeti. In this blog I cover some of the new headlines of the work accomplished since the Hadoop Summit in June, which include: Support for Dynamic Elastic Scaling Hive JDBC connections Data upload/download interface Ability to configure infrastructure topologies Placement controls for Hadoop nodes on Physical Hadoop tuning configurables A community contributed UI At this October release , we’re also supporting Cloudera CDH3u3, Greenplum GPHD 1.2.0.0, Hortonworks HW 1.0.7, and Apache 1.0.1. The...
Read more
by guest blogger, Tariq Magdon-Ismail (@tariqmi) In his joint presentation at Hadoop Summit 2012 titled “Hadoop in Virtual Machines” , Richard McDougall talked about the benefits and challenges of virtualizing Hadoop. In particular, he introduced the idea of separating Hadoop’s compute runtime from data storage on virtual infrastructure and touched on why this architecture is both desirable and feasible in a cloud environment. In this blog post I hope to examine this topic a bit further and present some initial performance results. The Evolution of Virtual Hadoop The common approach to virtualizing many applications is to perform a P2V (physical-to-virtual) migration where the physical deployment is directly cloned into virtual machines. Hadoop is no different. While this is a...
Read more
VMworld 2012 has come and gone, and VMworld Europe is on the horizon. We had several big data oriented sessions this year, and saw a significant rise in the activity in this important area. During the keynote, we demo’ed the next version of Serengeti , which allows Hadoop to be elastically scaled on a virtual platform. We showed a scenario of mixed workloads on the same platform, allowing time-based shifts in the amount of resources assigned to Hadoop and other workloads. You can see the highlights from the keynote here , and additional information on compute/data separation in an up coming blog. In addition, Jeff Buell and I presented on the work VMware is doing to make virtualization the best place for...
Read more
Introduction There’s no question that the amount of value being extracted from data is increasing – almost every customer I speak with is building new technology to gain new or competitive insights from tapping large volumes or rates of data. In the last few posts, I have introduced VMware technologies and products that provide data services to new applications. We see four major axes along which data requirements are stretching the limits of traditional approaches to data analysis: Big Data – The need to store and compute against hundreds of gigabytes of unstructured or semi-structured data Fast Data – The increasing need for low latency interactions with large sets of data, often driven by today’s mobile and social apps. Flexible Data – The need...
Read more
As a great validation of VMware’s decision to acquire Cetas , that team has been honored with the TiE50 award in the Software category. This award acknowledges their achievements and underscores the recognition we are getting in the Big Data Analytics space. We are thrilled to receive this prestigious award and it is a solid recognition of all the innovative and hard work done by the team. TiE is a leading entrepreneur-focused organization doing a tremendous job of fostering creativity and innovation amongst entrepreneurs while identifying the technologies and companies that rise to the top in their respective focus areas. More and more business users are demanding instant...
Read more
As part of our Big Data efforts, we have a team focused on Hadoop that is working hard to ensure Hadoop runs well on vSphere. We published a paper last year on Hadoop performance, and have a lot more in the pipeline. More recently, I took up a challenge to see how much we could learn about Hadoop I/O in a very short time, using our dynamic tracing framework. The results were quite interesting. To ensure I position this work correctly, it’s really a work-in-progress study that I’d like to engage a discussion around. It’s what we’ve learned by digging into the architecture, to help us make engineering decisions. I hope it’s helpful, and really want to get your feedback, so we can steer...
Read more
At the beginning of the year, I posted that we are in the midst of a data renaissance — where a rapid proliferation of new uses of data is driving new technologies to manage data. The traditional relational database had once been the main vehicle for serving the needs of online applications with optimized variants used to store longer-term data for business intelligence and analytics. No longer however, does one size fit all when it comes to data and database technologies. The unprecedented volume of information being collected and analyzed is driving new Big Data technologies. Extreme performance requirements and changing hardware architectures necessitate new Fast Data solutions, like in-memory databases. Applications that interact with data in a variety...
Read more