We had our first guest lecture this week. John Myles White presented recent work on modeling data from functional MRI experiments to understand the relationship between various mental states and physical activity in the brain. For more details, see John’s slides and this review paper on “mind reading”.
We also looked at Amazon’s Web Services for cloud computing. In particular, we discussed S3 for distributed web storage and EC2 for rentable computing. We saw several ways to use these resources for processing large amounts of data, including Amazon’s Elastic MapReduce framework, which acts as a pay-as-you-go Hadoop solution. We concluded with a brief overview of Apache Pig, a high-level language that facilitates data processing by translating sequences of common operations (e.g., GROUP BY, FILTER, JOIN, etc.) to a series of MapReduce jobs.
After signing up for AWS, see the web services console to interact with Amazon’s various services. In addition to Amazon’s EC2 tutorial, Paul Stamatiou’s getting started guide for EC2 runs through using the EC2 API tools to start EC2 instances. Amazon provides a similar guide to using S3 for remote storage, as well as command line tools to transfer data to and from S3. Finally, see the Elastic MapReduce getting started guide for a quick and easy solution to running Hadoop jobs with EC2 and S3.