Tuesday, June 1. 2010Amazon EC2: interesting papers regarding technical underpinnings
Some of them are not entirely new, but nonetheless interesting
Getting good I/O on EBS volumes http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs/ Disk I/O tests http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html Sidechannel-Attacks on EC2 http://people.csail.mit.edu/tromer/papers/cloudsec.pdf Loadbalancer performance http://blog.rightscale.com/2010/04/01/benchmarking-load-balancers-in-the-cloud/ Amazon Usage Estimates http://blog.rightscale.com/2009/10/05/amazon-usage-estimates/ Anatomy of an EC2 resource id http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/ http://www.jackofallclouds.com/2010/02/revisiting-ec2-instance-ids/ Has Amazon EC2 become over-subscribed? http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm Details about EC2 hardware infrastructure http://openfoo.org/blog/amazon_ec2_underlying_architecture.html http://openfoo.org/blog/amazon_ec2_network.html The Impact of Virtualization on Network Performance www.cs.rice.edu/~eugeneng/papers/INFOCOM10-ec2.pdf More info about underlying hardware: http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-benchmarking-in-cloud.html What's a "Compute Unit"? http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/ Friday, May 21. 2010Amazon EC2: how to automatically create snapshots of attached EBS volumes
This script requires the Amazon EC2 api/ami tools correctly installed and functioning. As this needs some environment settings, the file ~/.profile is sourced into the script, to have the environment available from within the script. If you start it via cron, may be once every week, you automatically have a weekly snapshot of your volumes. What's needed to find the volumes attached to a particular instance is the instance id, which is fetched from the meta-data repository, available under http://169.254.169.254/latest/meta-data in every EC2 instance. Once we know our instance id, we can grep through the output of "ec2-describe-volumes" and fetch the volume ids. Having them, the last part is to initiate a snapshot for everyone of them. As the volumes in question are used for backups only, I do not need to take special care about file system consistency, as simple "sync" will suffice. But if your volumes are being used frequently, you might want to incorporate Eric Hammond's consistent snaphsot into your script Thursday, May 20. 2010Amazon EC2: How to migrate an EBS-backed image from US to EU (or wherever)
What we want to achieve
You spent considerable effort into your nicely configured instance. But then you decide you need it not only in us-east-1 but also in eu-west-1. So you have to somehow move the EBS image to EU, along with the instance definitions. Here is how to do it for a Linux instance: What we need -- correctly installed and working EC2 ami/api command-line tools -- Elasticfox, if you don't want to do everything on the command line -- a running linux instance in each region to which an EBS volume can be attached -- a current and consistent snapshot of the EBS-backed instance to be copied -- basic knowledge about EC2 instance and EBS volume management -- cpipe. Under Ubuntu 8.04 is has to be installed: apt-get install cpipe What we are doing - copying an EBS volume to the target region - registering the snapshot created off it as an instance in the target region 1: You need a current snapshot of your instance. Most likely you have one anyway, else create one. If you create a snapshot from a running instance, take care that the file system is consistent. Ideally stop the instance before creating the snapshot. 2: Create a new EBS volume off the snapshot and attach it to any running Linux instance, ideally some newly started one that is not in production. For the sake of simplicity I assume you attached the volume as device /dev/sdf. Do not mount the volume, this is not needed! 3: Start a Linux instance in the target region 4: Create an image of equal size in the target region (if in doubt, make it a little bigger than the original volume) and attach the image to the Linux instance in the target region. Wait some seconds after the volume appears to be attached. 5: Create a block dump of the original volume and copy it to the target region. This can be done in one step (as root). On the receiving side: netcat -p 9999 -l >/dev/sdf Take care that port 9999 on the target instance's security group has been opened for the originating instance's IP address. Adjust the target zone's security groups accordingly. Netcat is now listening on port 9999 and will copy everything received to /dev/sdf. Also the command "netcat" might be named "nc", depending on your particular distribution. On the sending side you might want to use cpipe. It is not really needed, but shows how fast the copying goes and how many has been transmitted yet. Start the copy process: cpipe -vt -b 1024 < /dev/sdf | netcat -q 1 ec2-hostname 9999 The contents of /dev/sdf is fed into cpipe which outputs performance data. The data is then piped into netcat, which sends everything to ec2-hostname to port 9999 until EOF. Once the copy is finished, netcat will terminate on the receiving side as well. 6: After copying check the file system on the freshly written volume (you haven't mounted it, have you?) and afterwards detach the volumes (on either side): root@ip-10-227-2-64:~# fsck.ext3 -f /dev/sdf On the originating region the volume can be deleted if not needed otherwise. The instance used for copying in the originating region can be terminated as well. 7: Create a snapshot from the volume and register it as an EBS-backed instance: ec2-register -K pk-123ABC.pem -C cert-123ABC.pem --region eu-west-1 \ This command can be issued from any machine anywhere, may be from your local PC. It only requires a correctly installed and working EC2 command line tool set. That's it. You can now start your freshly migrated EBS-backed instance. Some more information Compressing data before sending One would imagine that copying might be faster if the data are compressed before sending, but instead it turned out to take much longer than sending it uncompressed. At least an c1.medium instance appears to not have enough CPU power to saturate the network connection when data are compressed on the fly with bzip. Same occurs with gzip even if this won't consume as much CPU as bzip2. Data rate between us-east-1 and eu-west-1 was mostly about 5MB/s stable. But every now and then transfer rate slows down to a crawl for some minutes. Kernel and ramdisk images have been left at "default". It worked for me, but I don't know whether this works for every Linux, as the kernel and ramdisk images have to be available in the particular region. Friday, April 9. 2010Filesystems without partitions on Amazon's EC2
Why this is a good idea in Amazon's Elastic cloud.
When a hard disk is initialised for use, you normally create partitions first and then create file-systems on the partitions. Yet on Amazon's EC2 instances one often sees file-systems directly on (virtual) devices. Like the /mnt partition, which is (at least on Alestic Ubuntu) mounted as /dev/sdb. But why is this and what advantages has this approach over the usual way? When you start a new instance, you most likely have a script running on the new instance that initialises all the system parameters and the software you need on the insance. And part of it is initialising the virtual hard-disks. Unfortunately their size tends to differ -- not only between instances of different types but also between instances of the same type. Just this week I ran into an m2.4large with a 900GB ephemeral while the second m2.4large instance came with only an 850GB drive. So for partitioning of drives with non-predictable sizes you would have to make your initialisation scripts more complicated than necessary by calculating partition sizes. Instead you just create a file-system on the drive device: "mkfs.xfs -f /dev/sdc" and there you are. BTW: the ephemeral drives usually come pre-initialised with ext3 so you might be able to mount it directly. There's another advantage of doing away with partitions: expanding an existing EBS volume (snaphshot it and create new, bigger volume from the snapshot) is more complicated if you have partitions on it. After expansion you would need to grow the partition first and then expand the file-system. If you don't have a partition at all, you just issue a "xfs_growfs /mountpoint" or "resize2fs /dev/sdX" and that's it. Wednesday, April 7. 2010How To Build A Heartbeat Cluster
Today we will install and configure a basic high-availability cluster working as a very simple web server. I am using Ubuntu Linux and a VMWare environment for this How-to, just for the sake of simplicity. This howto is meant to give you a working ha-cluster to have a starting point for testing and further research. Please remember: what we install and configure here is not necessarily ready for production. I make some shortcuts one might not want to do in a production environment. This mainly applies to the mechanism for detecting a failed node.
Preparations We need two identical machines whose only difference is their IP address. Then we also need a third IP address that is used for the highly available service. In our case the service will be a simple Apache web server, running on both cluster nodes. We create two machines with Ubuntu 8.04 server 64Bit and chose "openssh server" during the installation, nothing else. After installation perform the usual apt-get update, apt-get dist-upgrade. Take care that all usernames and passwords are the same between the two cluster nodes. Give the nodes a static IP address. I gave "hacluster1" the address 192.168.35.81 and node "hacluster2" the 192.168.35.82. Of course you have to adapt the ip addresses to your infrastructure. Now we install the heartbeat software: apt-get install heartbeat-2 heartbeat-2-gui xauth For the floating IP address to work we need to append the following line to /etc/sysctl net/ipv4/ip_nonlocal_bind = 1 Now we're configuring the heartbeat cluster. Edit /etc/ha.d/authkeys (the file doesn't exist yet): auth 3 after saving it, change the file's permissions: "chmod 600 /etc/ha.d/authkeys". The file defines how the communication between cluster nodes is authenticated. Next file to edit is "/etc/ha.d/ha.cf" (the file might not exist yet): logfacility local0 The second line defines which machines are part of the cluster, thus "hacluster1" and "hacluster2" should be hostnames. "bcast eth0" tells the heartbeat software to communicate with the other nodes via broadcast packets on eth0. Last thing to do is to set a password for user hacluster on both machines. Now we have a readily configured cluster of two nodes and we should log into the VMWare control center to make snapshots of each node. Thus we can go back to a vanilla cluster everytime we want. Highly available web server. Now that the cluster is ready for work we also need a service to be managed by the cluster. For the sake of simplicity we will install an Apache web server on both nodes: after installing the server software with "apt-get install apache2" remove the symlink for apache in /etc/rc2.d. On a normal server machine the web server is started automatically at system start-up via these symlinks. But on a cluster only the cluster software is responsible for starting and stopping the "clustered" services. Edit "/var/www/index.html" and change "It works" for "hacluster1" on the first node and "hacluster2" on the second. Thus we can easily see in the browser from which node the web pages are being served. Now set the password for user hacluster, else we cannot log into the gui. Just chose any password you like. Everything you did until now had to be done on each of the nodes. But now that the cluster is prepared, the remaining configuration is done only once and will be propagated among the nodes automatically. Log into one of the nodes from you local X11 xterm and start "hb_gui", connect to 127.0.0.1, user hacluster and the password you've chosen. Remember: if you want to use a remote X11 app, you have to log in from a local xterm with "ssh -XC OK, we're logged into the cluster gui. While the cluster is generally working, it doesn't do anything at the moment, there's nothing yet configured. What we'll do is to configure a highly available web server cluster, where either note on or two will serve static web pages. First we need to define a shared IP address for the web server. So right-click on "Resources", chose "Add new item", leave type as "native". Now scroll down in the list and chose "IPaddr" in the column "Name". In the "Parameters" field below, type the shared IP address into the "Value" field, hit RETURN and then click on "Add" (lower right): ![]() The second item will be the apache resource itself: in the main gui window right-click on "Resource" again and "Add new item", type "native". Chose "apache2" from the list, no additional parameters needed. Click "Add" in the lower right. We could now start the resources and the web service would work already, serving from either node1 or node2. But to have a well configured cluster, we need the Apache service to be "bound" to the IP address resource, so that the Apache is always running on the same node where its IP address is running. So we create a co-location: right-click on the "Colocations" entry in the "Constraints" list, chose "colocation", give something meaningful as "ID". Chose your IP resource as "From" and the Apache resouces as "To", leave the score as "INFINITY", click "OK".We also have to make sure that the Apache is always started on a particular node only after its IP address has been activated, else the web server might not work. Thus we need an "Orders" rule: Right-click on "Orders" in the constraints list, "Add new item", leave type as "order". As "From" chose the IP address resource, leave "Type" as "before" and chose the Apache resource as "To". Click "OK". Now everything is ready to be started: right-click on both resources and select "start". ![]() If you now try to view the web site in your browser, it should either show "hacluster1" or "hacluster2". Let's test the fail-over process: right-click on the "cd" node, chose "standby". You should see the two resources quickly moving to the other node. If you now reload the page in your browser, it should show the other node than before. Once you switch the standby node back to "active", the resources are moving back as well. That's all for now. You have a working cluster as a start for further testing and research. The clustered Apache in this example would be useful in a production environment only if it serves completely static content that doesn't change frequently. And you have to make sure that both the Apache configurations as well as the web files are identical on both nodes. But as almost all web server these days are using databases and their content is updated very often (like this blog, ahemm), the clustered Apache as described her doesn't make much sense. More to it in the next instalment. Tuesday, April 6. 2010What is a high-availability cluster anyway and why do I need one?
(For the impatient: actually building and configuring a simple but working load balancer cluster is described soon in this blog's next instalment)
Anyone who runs servers in a production environment knows that eventually everything breaks at some time. And with breakage the service your server provides ceases to work. Which is bad. How it works Thus you need redundancy: if -- for instance -- you're running a website on your own hardware, for a reliable and highly available service you need at least two servers, each of them able to deliver your web documents to clients. In the simplest configurations one server is active while the other is in stand-by. Each machine has its IP address and the service itself (e.g. your website) is bound to a third -- floating, also called virtual -- IP address which is always assigned to the active server. If the active server fails, the virtual IP address is switched over to the former stand-by machine which now becomes active. Once the other (formerly active) machine comes on-line again, it joins the cluster again and either remains in stand-by or becomes active again. That depends on the actual configuration.This concept is easy to understand so far. But the tricky part is to decide whether a server or service is off-line and when to fail over. The cluster software also has to make sure that at any time only one of the machines is assigned the virtual IP address. For Linux systems the easiest cluster software to configure and maintain is heartbeat/pacemaker (heartbeat is an abandoned project with pacemaker its successor). Heartbeat is also the way to go when clustering Mac OS X servers. Mac heartbeat comes with Mac OS 10.4 (+) server, but is to be configured somewhat differently (PDF!) than its Linux counterpart. So a cluster is generally some instrument for high availability, consisting of at least two machines, either being capable of providing the service. In most clusters one machine is in stand-by while the other one is in production. Other configuration are possible as well as more than two machines per cluster. For the sake of simplicity we will further assume to have only two nodes in our cluster. Detecting Failures That's the tricky part. You need some mechanism that can reliably detect when a node is off-line. And you want to avoid a split-brain state, in which both nodes think the other one went off-line, while in fact only the network connection between them is broken. So you better think twice how you design the heartbeat connection between the two nodes. The easy way is to just use the already existing network connection as heartbeat communication. But if the network switch goes down, you will experience a split-brain situation: neither node can reach the other, thus thinking its counterpart went off-line. Because of this it is better to have both nodes communicate via a second network interface and a cross-over cable. Or via serial connection. The Fail-Over Process Once the remaining node detects the other node being off-line, it assumes command over the service defined: it assigns the floating IP address to itself and afterwards starts all the appopriate services (web server, load balancer whatever) and is now providing the service, instead of the other node. Split-Brain I already mentioned a split-brain state and I also mentioned that you want to avoid it at all cost. But why is that so bad? First: a split-brain state causes unstable and unreliable behavior. There might be both nodes thinking of each other as being off-line and start fighting for the floating IP, thus the service itself becomes unstable or inaccessible. If there's no shared storage (like for a mere load balancer cluster), that is about it and easily to recover from, yet you have some service downtime. Second: If you activated STONITH and a split-brain situation occurs, the nodes might start a shootout, killing each other repeatedly. Of course that means service downtime. Third: even worse is it when you not only have a cluster going mad but also a shared storage. Think about a file server and a SAN storage array connected to both nodes. Only one of the nodes is active, thus using the storage array, while the other is in stand-by not accessing the storage array. Now in split-brain state both nodes think they are active and will access the storage, corrupting the file system, because neither knows about the other's file system accesses. You don't want that. S.T.O.N.I.T.H. This is not referring to some super-secret agent flick from the sixties with Robert Vaughn, it simply means "shoot the other node in the head", making sure it is actually off-line when appearing to be. It might happen that a node's OS crashes, leaving the node in a half-dead state. It might not respond any more to the network, but it might still access the shared storage. So before the remaining node can assume command of the clustered service it has to be sure the other node won't access any shared resources any more, which is best achieved with hitting a virtual reset switch on the other node. Best way to do this is via IPMI. If you bought servers not equipped with IPMI, shame on you, cheapskate. Scalability Your cluster is up and running and you eliminated the single point of failure that was your simple web server. But how about scalability? What if the one active web server cannot stand the load of your suddenly immensely popular website? That is clearly subject to another (dozen) blog posts. But the general idea is to have the cluster do as less work than possible. Regarding a website I would have the cluster only act as as a mere load balancer with the actual webserver machines behind it. Balancing load is not much work and you can easily saturate much more than a gigabit ethernet with one balancer. As long as you have enough web servers as back-ends. And synchronising their content as well as having a highly available database server is an equally challenging task. But that's another story ... Thursday, October 8. 2009What's in a server?
or why any el-cheapo PC will never be a reliable server.
... not even a brand-new PC of any major make, forget it. And by the way, I am only writing this to have all the arguments at hand in case of any re-occurrences of that unpleasant event we IT pros all know: despite we hid the fact of us being in the IT business carefully, on that certain party some moron found out we're in IT and is now babbling our ear off with the most profound smattering we never wanted to witness. And he's of course educating us about how the money for a "real" server is wasted, because it's a rip-off anyway and why a decent PC is at least as good and faster anyway because -- ya' know -- its CPU runs on 3GHz instead of 2,66. I hereby tell you: it's not. Never. Nada. Niente. General Hardware Quality The server market is quite different from the market for ordinary PCs. For the latter, every penny counts, thus everything is sold via price, everything built from the cheapest components. The server market however is more bound to trust and reliability and components are mainly sold and purchased with longevity in mind. Reliability as well as longevity as crucial requirements obviously cost money, and the higher costs are mainly caused by better components and profound hardware design. Redundance and hot-swappabability Despite being reliable, everything can fail and will do so eventually. And to mitigate the consequences of a hardware failure, some of the server's components are designed to be redundant, especially the parts that are more prone to fail than others. Namely hard disks, power supplys and fans. Every of these components is available usually at least twice: two power supplys, two rows of fans and several hard disks. All of these components can be removed and replaced without shutting down the server. So in case a power supply fails, you just pull the faulty device off the server's back side and replace it with the spare one you had on the shelf. One interesting detail in regard of reliability is that a server's CPUs don't have fans attached -- unlike your home PC. If a CPU fan in your PC fails completely, the CPU gets fried almost instantly, and before you could shut down the machine. Hence a server's CPUs only have heat spreaders and the air-flow through the server is designed in a way to blow enough air to the heat spreader for the CPU to remain cool enough. So in case of a faulty fan, you just swap the fan (or the entire fan module) and that's it. Really expensive servers are even capable of having entire CPUs or RAM modules be swapped while the machine is working, without any downtime. But that's another story ... Availability Of Spare Parts If you want to upgrade a somewhat elderly PC, it might be difficult to find the appropriate RAM modules or a CPU with the right socket. This effect can already occur after only three years. But for a server you want to be sure you can buy a replacement CPU or another RAM module for at least five years after the server went into production. And as some components might no longer be produced after five years, the server manufacturer has to keep a sufficient stock of spare parts available -- which costs money that he has to calculate into server prices. ECC RAM It is bad enough if a RAM module in your PC defaults and gives back bad data. But what's even worse is that nobody might notice: not the CPU, not you as the user. If the faulty bit was saving some parts of your thesis (data) instead of code, the machine most likely won't crash (showing you that something isn't right) and will just write crap on the disk. So you want something that: a) detects memory errors and b) can repair them on the fly Here comes ECC RAM: While you might want that in your PC too, you might not want to pay for it (you can't put them into a normal PC board and would have to buy a server board instead, which itself would also need a server CPU, making the entire thing too expensive for a mere PC). But you definitely want that in a server that is supposed to save all your files over years without a hiccup. Imagine an undetected faulty cell in your server's RAM and this very cell is used by the server's OS as a drive buffer .... RAID Controllers You want a real RAID controller in your server. A REAL one with its own processor, keeping the storage work off the CPUs -- in contrast to the cheap Host RAID controllers (where still the main CPU has to do the work). And these real ones aren't entirely cheap, adding cost to your server. This statement is true at least if your server is supposed to deliver high I/O performance, like a heavily used database server. If you're building a file server for a mere ten people, a software RAID would suffice as well. Better Hard disks Yes, there's also a different kind of hard disks for servers. While they might look almost identical to the cheap ones in your PC, server disks have different requirements than consumer ones. First of course is reliability, their error rate is considerably lower. Second: they don't have to be quiet as the server belongs to the server room anyway. Thus server hard disks rotate with either 10000 or even 15000 rpm (in contrast to the desktop disks, rotating at 5400rpm), making lotsa noise but are blazingly fast. IPMI This is one of the most important features of a real server. The IPMI is the server's remote control. You can remotely switch it on and off, reboot it and you can even access the BIOS screen over the net as well as the boot menu. With no additional hardware. The IPMI is basically a small extra computer built into the server. You really need this if some machine in the remote data centre ceases to work and you want to push the (virtual) reset button. Regarding the particular server's make there's also slightly different incarnations beside IPMI: Sun calls it ILOM, and with intel's modular server comes an entire web front-end for managing not only the server blades but also the integrated storage and the network switches. Hardware monitoring capabilities While modern PC motherboards often deliver APIs like ACPI, actual implementations on common PC boards are greatly varying in quality and usefulness. But for your servers you want to get sure you are noticed once something deviates from the usual parameters or if some component fails. Thus you have to rely on some API that delivers exact operational data as CPU temperatures or fan rpm and you need the software which processes these data into some meaningful output. Server manufacturers are providing this software in a way that is scriptable and its output can be fed into NAGIOS and Munin for monitoring and alerting. While the "monitoring" software coming with most PC boards is some fancy Windows app which looks nice but other than that is rather meaningless. You can also monitor a server's hardware parameters via IPMI from remote (depending on the actual IPMI implementation). Conclusion It turns out that there's some really different requirements to a server that are to your usual PC. And this is the reason there are still real server machines produced and sold. If it all were bunk, they simply wouldn't exist anymore. If you need a machine that acts as a server, buy a server and not a PC. Thursday, August 6. 2009Another exemplary use case for Amazon's cloud services
http://aws.typepad.com/aws/2009/08/dctptv-webtv-solutions-for-professional-broadcasters.html
Thursday, July 16. 2009How to securely access MySQL from remote
Hint: This applies not only for your EC2 instances but for every remote mysql server you want to configure and access. The ssh tunnel technique works under Linux, Mac OS and as well under Windows with Cygwin Bash (maybe also with putty, but I don't use it). Though MySQL Administrator's features differ somewhat between the operating systems.
You started you new instance and installed a mysql server, most likely for some web application. And now you need some users with their respective rights and some databases as well. Yes, I've seen it all too often that web applications are using the mysql root user. DON'T DO THIS! That's a security risk! Give every web application its own database user and its own database. It doesn't cost anything but prevents you from total loss of all data once a remote vulnerability in one of your web apps is exploited. But creating users and assigning their rights is somewhat cumbersome if you want to do it with SQL alone. MySQL Administrator ![]() And please don't install PHPMyAdmin if you have ssh access to your server. PHPMyAdmin is ok if you have some webspace without ssh access, but for us there's a much better tool: Mysql Administrator. You can download it off the Mysql website or if you're working under Linux, just install it with your distribution's packet manager. MysqlAdministrator is indeed a nice tool for managing all the database users and their rights on the different databases of your server. But how do you connect to the remote database server? SSH is your friend There's a really useful concept called SSH tunnels, which can help you in many ways to access services on a machine not open to the public internet. Generally spoken, SSH opens a tunnel from a port on the remote machine to a port on your local PC. You then have your local application connect to the port on the local machine -- et voilá! For accessing MySQL on our remote server it looks like this: dirk@ubuntl:~$ ssh -i .ssh/mykey.pem -l root -L 33060:localhost:3306 -N ec2-75-101-190-95.compute-1.amazonaws.com We need again our ssh key for logging in (in this case .ssh/mykey.pem) connecting as root to the remote machine (-N ec2...). And we have the ssh connect to the remote port 3306 (standard MySQL port) on "localhost" -- which in this case is the loopback interface on the remote server. And the tunnel's end on our local PC is connected to port 33060.The MySQL server is by default bound only to the loopback interface ("localhost" or 127.0.0.1) for security resons, thus we have to use this somewhat complicated command line. Anyway now we can connect with MySQL Administrator running on our PC to 127.0.0.1, port 33060. When you add users, you have to add the host/networks from which the user is allowed to connect as well. So after you created a user, right click on its icon and chose "add host". As the database server only listens to localhost anyway, add localhost for the user. That's pretty much about it. Unless you are running more than one server and other servers in your infrastructure have to connect to your database server as well. For this we have to change the server config for Mysql to listen to the "real" network interface as well as to localhost. Edit /etc/mysql/my.cnf (Ubuntu) and look for the line: bind-address = 127.0.0.1 comment it out, save the file and restart the database server: /etc/init.d/mysql restart Now MySQL is listening to the network. For remote connections to be allowed, go back to MySQL Administrator and add the respective hosts or networks to each user. But be restrictive and don't open everything to all! To contradict myself, if you're running an EC2 infrastructure, it might be immensely useful to open access to 10.0.0.0/8 for every user who remotely accesses your databases. The default security group prevents other people's instances from connecting to your machines anyway and if you start -- may be --- a new web application server, you don't have to fiddle with the database's user permissions. If you are using autoscaling, you have to allow access to 10.0.0.0/8 anyway, else newly started servers couldn't connect to your database servers, rendering the entire autoscaling procedure meaningless. Monday, May 25. 2009Managing Amazon EC2 - Designing For The Cloud
Everyone running a production infrastructure most likely acquired the somewhat anal attitude of looking at everything and asking "what if this breaks?“. If you don't -- good luck! Ok, you know what I want to say: every crucial component has to be redundant, which means heartbeat clusters of load balancers, RAID arrays, two network switches -- connected via Spanning Tree, the server's network interfaces with Channel Bonding (Trunking, whatever it's called) and all the stuff.
While some of it still applies in the Amazon cloud, some of it is thankfully no longer your business -- namely the RAID and the entire network infrastructure. That's already managed for you by Amazon, like reliable storage and hardware. So what's definitely left for you is redundancy of services -- servers -- besides some other things that might be special cloud features. Only temporary! In the classical data centre you might have a set of redundant servers -- namely two load balancers, bound together via heartbeat. Behind them the actual application, database and web servers. More than only one of each. So your services provided withstand the failure of at least any one server, maybe to or even more. But most crashed servers can simply be rebooted or some failed component exchanged and the server is running again. Scripts are your friend! In the EC2 instead you have to design for complete disappearance of servers. Each virtual server is to be seen as only temporary! If you only reboot it, nothing horrible will happen, but if you inadvertently shut it down or if it crashes and a reboot doesn't help, the complete instance with all its data might be inaccessible! Even Amazon might have a hardware problem on the host machine and your instance goes poof. Or you botched the ssh config and cannot log in any more: Inaccessible forever! Hence you should take care that your images are up-to-date and you should have some scripts that are running after a new instance is fired up to configure everything automatically. That includes partitioning of ephemeral storage, creating file systems, mounting them, creating directories and user rights, even to wait for the block storage to appear inside the instance and mounting it once it's available. All your services should wait before being started until they have all their data available. And mostly they connect to other instances whose IP addresses cannot be known in advance and might have changed since the image had been created, thus the application's configurations have to be adapted before starting them as well! Image early, image often! Make images of your instances as often as needed! Means every time you changed some config or did an OS update. Keep two or three generations of images per machine and definitely keep an instance you know it works. Test every image created with starting a test instance off it. Only then you will know for sure it actually works. Redundancy and fail-over In the front-line of your infrastructure will be most likely be a load balancer with all the application or web servers as back-end servers. But as the load balancer instance might crash, you need a second one running. As every instance is being assigned only one IP address, you cannot use the heartbeat mechanism with the „floating“ IP address. The good thing is: you don't need it in the Elastic Cloud: The active load balancer is being assigned an Elastic IP and is in production. In case the fail-over machine is needed, you just switch over the Elastic IP and everything works again. The tricky part is only to automatically detect the failing of you active load balancer. You don't want you automatic switch-over running mad because you scriptery falsely detects inoperable load balancers. Checks might be achieved with fetching a certain file off the Elastic IP (therefore actually fetching it from one of the back-end servers) or by parsing the load balancer's status info. With this model, you load balancer machines are the only thing that has to be switched automatically in case of failure. All the other services are load balanced anyway and with them services being monitored and failures reported accordingly there might be no need for immediate manual intervention as long as there's enough back-end servers available. Ideally your sophisticated scripting will automatically launch a new back-end server from its image and incorporates it into the load balancer's config. Yes, Amazon has announced a service that might provide exactly this. So may be a hand-crafted solution for this is obsolete now. I didn't try this service yet. And they also announced the Elastic Load Balancer, which I didn't have yet time to try out. Most likely it will render most of your own load balance scenarios obsolete. But at least if you need session stickyness, the ELB doesn't provide it yet and you are still on your own. Availability Zones Amazon runs several data centres at different locations and you should chose the availability zones for your machines wisely. In case one data centre goes off-line for whatever reason, you don't want to have all of your machines in this one availability zone. Knowing this, you will start one of your load balancers in zone 1a and the other in zone 1b. Same goes for the back-end servers which should be evenly distributed among the available availability Data storage Every instance comes with a set of virtual hard drives -- up to four on an xlarge instance with a combined storage space of more than 1,5TB. Data on these „ephemeral“ disks vanishes with the shut-down of the particular instance. The Elastic Block Storage on the other hand does not vanish and can be connected again to the next instance you started. But using the EBS means extra cost. So you have to decide whether you use the built-in disks or attach an EBS volume to your instance. The decision should be based mainly on the question whether there is ever-changing data on the instance or not. A load balancer for instance might not carry any changing data at all -- besides its log files. Hence attaching an EBS to a load balancer might be a bit overdone. Its log files can simply be copied over via a hourly cron script, to a storage server with an EBS attached. So you'll lose one hour's worth of log files at most. But if you run a database server or a web server whose web content might change several times a day, I would strongly recommend using an EBS volume for it. Depending on the actual data and usage it might still be viable to copy database and web files from a backup to a just started new server - onto its ephemeral storage. There can be no universal rule as all this depends on the amount of data, the frequency of them changing and your infrastructure's general design. A poorly implemented usage of EBS volumes might botch your servers, as well as a cleverly designed use of ephemeral storage might suffice for quite a large amount of (identical) servers. Think about it and decide accordingly. Database replication Her comes the headache! Database replication is some fairly sophisticated stuff and there's many ways to fail on it. I will go only so far as there's no turn-key solution, neither for MySQL nor for Postgres. And don't ask me about Oracle, last time I fiddled with it is already ten years ago. In case you're not sure how and what to do, hire an expert, it will be worth the money. Databases might react in very subtle ways on a crappily implemented replication design. OK, they might crash loud and hard, but if you're very unlucky you will run into strange errors only after a while and all your data might be silently corrupted already. Bummer! For this being a subject some people wrote entire books about, I won't say anything else than: know what you do, if not, hire an expert! Network and load balancing Every EC2 instance only has one IP address -- which isn't entirely true. To be correct: it has two: one official for connecting from the outside and one internal RFC address. While the RFC addresses will be persistent throughout the lifetime of your instance (are they? I'm not entirely sure), the offical addresses might change, for instance once you assign an Elastic IP address to an instance. The formerly assigned address is then being dis-assigned and the Elastic IP put in place instead. From within your instance nobody won't notice any changes, all this takes place in Amazon's NAT/Packet Filter/Router black box. If you remove the Elastic IP from an instance, it will be assigned another official IP address instead, but which one cannot be predicted. Thus for every service to be reliably accessible from the public internet, you really should use an Elastic IP. And you have an elegant means of failing over from an inoperable instance by switching the Elastic IP to the other -- identically configured -- one. With the Amazon Elastic Load Balancer, some of the scenarios might make running your own load balancer obsolete, but I still have to try this out. What I already know is that the ELB doesn't provide session stickyness. If your web users have to stick to one particular back-end server for the server being able to track the user's actions, you cannot use the ELB at the time being and still have to run your own load balancer. SSL is demanding If you run some web servers you put them behind your load balancer and you can run many virtual hosts behind one IP address -- as it is the usual way with running web servers. Nothing particularly different regarding this in the EC2. But if you run some SSL servers instead, there's a problem: an SSL web server demands its distinctive IP address and you cannot run several SSL servers on one IP address. And as every instance can be assigned only one Elastic IP, you would have to run several load balancers -- one for each SSL server. There's a shortcut though, if all SSL servers have the same domain name. For instance, if you want to run admin.example.com, download.example.com and shop.example.com as SSL servers, you can buy a so-called Wild-Card-Certificate and then run all three SSL servers on one IP address. This however does only work if all servers are using the same domain. If there's shop.example.com and download.example.de as SSL servers, you still need two IP addresses. Maybe you can consolidate the .de servers into .com (or vice versa) for saving money (less instances running!) and hassle (less instances to manage!). Load balancing for the poor If you're tight on money, and you're running a rather small infrastructure, you can deviate somewhat from the pure religion and place the load balancer on the web server itself. Two identical machines, each running the web server as well as the load balancer. You're saving money as you only have two instances instead of four with distinct load balancer machines, but once one machine becomes inoperable, you also have only one web server left. Hence one server has to be able to handle all the load! And then the web server cannot run on the same port 80 as the load balancer wants the port 80 as well! So you have to run your Apache on may be port 81. If you are really tight on money and you're just running your private stuff and if you can live with some downtime, you could as well run only one machine and have an up-to-date image on S3 along with a backup of all your web stuff -- to be fired up once the original instance fails. But that model is only suitable for private servers, as any commercial site cannot really afford downtime nowadays. Friday, May 22. 2009Amazon EC2 - an exemplary use case
This: http://aws.typepad.com/aws/2009/05/ec2-and-wowza-media-support-belgiums-largest-live-streaming-event.html illustrates very well the benefits of cloud computing.
Monday, May 18. 2009Managing Amazon EC2 - SSH login and protecting your instances
How to log into your freshly fired-up instances and how to secure ssh access
(works under Linux and Mac OS, under Windows with Cygwin) First time you want to log into a newly started instance you appear to have the chicken-and-egg problem: how to log in when you do not know the root password? Luckily Amazon devised a comfortable way to circumvent this: your Key Pairs. these are not to be confused with the „Access Key IDs“ on the Access Identifier web page and they are neither the X509 certificates. These Key pairs are automatically generated the first time you log into the web console and you can only download its private part. Store it in your ~/.ssh directory. In case you missed the download or don't know where you've put it AND you don't have any instances running, just generate and download a new one. Beware: already running instances are bound to this key and if you didn't change the sshd config like described below, you won't be able to log in anymore! A new SSH key can be generated both on the AWS web console and ElasticFox under „Key Pairs“. For the sake of simplicity we assume that there's no instances running and you either have your key already placed in ~/.ssh or created and download a new one. Beware again: everyone who gains access to this key can log into the respective instances as root! So make sure it is readable only by you. You can even have several Key pairs, may be if several people are supposed to manage a sub-set of instances while each one isn't allowed to log into the instances managed by others. Whether this concept is recommendable or not, it is possible. So for the first-time login to a new instance, use this: ssh -i .ssh/mykey.pem root@[AWS_public_hostname] As soon as you created users on your web server instances and provided passwords for the accounts, you want your web developers be able to log in -- without having root access of course. And (at least for some/many public images) password logins are denied per default and you really don't want to give your key to anyone who must not have root access. What to do? Activating Password Access Easy: either activate password logins or use ssh keys. For passwort logins edit /etc/ssh/sshd_config and set „PasswordAuthentication“ to „yes“,issue a „/etc/init.d/ssh reload“ for the new config to become active. But a much better solution is to use ssh keys. SSH keys as a password replacement SSH uses public key cryptography, hence its keys consist of a public ("id_dsa.pub") and a private part ("id_dsa"). These keyspairs are both user- and machine-dependent. Thus you have to generate them once for every user on every machine from which you connect to the servers (not for every server!). And don't just copy them over! The private part of a user's key has to remain in ~/.ssh/ and this directory must be not readable by other users. The public part instead doesn't have to be hidden from other's views, that's why it is called "public". At first look into your user accounts ~/.ssh/ on your local machine, whether there might already be the files "id_dsa" and "id_dsa.pub". If they exist, don't make new ones and proceed to the copy process instead. If the files don't yet exist, create them: ssh-keygen -b 1024 -t dsa You will be asked for your passphrase twice. Input your desired passphrase and try not to forget it. Now you have two files in "~/.ssh/": "id_dsa" und "id_dsa.pub". Copy the public part to your instance and append it to the user's "~/.ssh/authorized_keys" file. Make sure the user rights are correct: the "~/.ssh" directory and everything in it should be owned by the respective user, "authorized_keys“ should have 600 and "~/.ssh 700". Next time you log in you won't be asked for a password anymore and you don't have to give your AWS Key while logging in. Do this for every user who is supposed to log into your instance, test that it works and then de-activate "PasswordAuthentication" in the sshd config, if you activated it before. In case your local machine is running under Windows and you don't use the Cygwin environment but Putty instead, please read here: http://tartarus.org/~simon/putty-snapshots/htmldoc/Chapter8.html#pubkey Remember: Your chosen passphrase while creating the ssh keys protects access to the private part of your key. This passphrase doesn't have anything to do with login passwords. You can also just press RETURN once asked for a passphrase, but then everyone who gains access to your private key file is able to log into the servers where the public part is installed. Disabling remote root login Once user logins are working, you want to deactivate direct login as root via network. This is generally a good idea as it makes it even harder for anyone to really exploit some possible vulnerabilites in the ssh daemon. While your instance is already quite secure with no password login and ssh keys, it is always better to be safe than sorry. First you have to give a password for the root account, if your Linux doesn't use the sudo mechanism, as Ubuntu does. In this case you don't have to give root a password. Just log into your user account and try "sudo -s". It asks for your password and then you should be root. For Linuxes which doesn't use sudo, like SuSE, give root a password, log in as user and try "su -" with root's password. If either su or sudo works, you can de-activate remote root access in /etc/ssh/sshd_config with setting "PermitRootLogin" to "no". Of course there's other means to secure your server even more -- which would go too far for this instalment, like preventing most users to gain root rights at all. Further securing access to your instances If you're running a production environment, you want to secure it as much as you can. Even if you installed ssh key authentication and forbid passwords, you can tie access further down with restricting access via ssh to only some IP addresses like the public one of your office's internet connection. This is especially useful if there's several web designers working on your servers and you aren't sure they know how to handle passwords and stuff. They might carry unsecured notebooks around, they might mail their user keys around and everything. So in case your office's internet connection has a fixed IP address, you can restrict ssh access to only this address. While -- as far as I know -- it is quite common in the US to have an internet connection with a quasi-fixed IP address (bound to your DSL modem's MAC address), there's countries like Germany where the usual DSL line has dynamic IP addresses, which actually change every 24 hours. In this case restricting access to IP addresses is of course out of the question. So either stick with the ssh keys or change your DSL line into one with a fixed address -- which has other benefits as well, like a more stable VPN access (yes, I know, dyndns works -- mostly). Anyway, to tie down ssh access you have to manipulate the Security Groups assigned to your instances. Remove the rule where ssh access is allowed from everywhere and instead create a new one which allows access only from your office's IP address. And you might think about a second rule with another IP address you can login from -- in case your office's DSL line is down. Don't worry, if you misplace some rules, you cannot lock yourself out. You can alway use ElasticFox or the AWS web console to change the rules back to „access from everywhere“. Sunday, May 17. 2009Managing Amazon EC2 - A Nice Tutorial (not by me)
While rummaging through the AWS forums, I found a really nice tutorial for getting started with the AWS web console.
Managing Amazon EC2 - ElasticFox and S3Organizer
While Amazon provides a rather nice web console for its Elastic Cloud, there's (not yet) anything similar for its Simple Storage Service. But you can manage all you EC2 and S3 stuff via Firefox. There's ElasticFox for EC2 and S3Organizer for S3. Read here how you install and configure these tools.
Installing ElasticFox and S3Organizer Important is that you already have Firefox 3.x installed as the add-ons won't work with version 2.x. First download and install ElasticFox and S3Organizer. After the usual Firefox restart, both add-ons appear in the Firefox' Tools menu. Configuring Your Access Credentials Open Elasticfox, klick on "Account IDs". Now you need you Account Number: log into http://aws.amazon.com/account/, -> "Your Account", -> "Access Identifiers". You'll find the Account Number in the page's upper right corner, just below your account name. Copy the number into the "Account ID" field in Elasticfox's window and give something meaningful as Display Name, e.g. "Dirks instances", klick on "Add“ and close the window. Now click on „Credentials“, give the same name as above as „Account Name“. Now copy the AWS Access Key as well as its secret part from the AWS account page into the credentials window. Klick on „Add“, then close the window (I'll get back to ElasticFox soon). Now you're ready for EC2 action, but first we'll configure S3Organizer. Start it from Firefox' Tools menu, Klick on "Manage Accounts“. Give something meaningful as Account Name and copy the Access Key and the Private Access Key from the AWS Account Identifiers web page into the respective fields, klick „Add“. Et voilá! ![]() I haz a bukkit! The S3Organizer shows your local files in the left pane and your S3 buckets as directories on the right. Further usage is quite obvious, hence we get back to the ElasticFox. ElasticFox' „Instances“ pane shows your running instances, doubleclicking on one shows every detail. The „Images“ pane lets you chose form either all public images or your private ones, if you type your S3 bucket name in the filter input field. And everything is much like on the AWS web console. ![]() Now you have a fully working environment for managing both your EC2 infrastructure and your S3 contents. And how you are actually managing your instances and images and how you should design your infrastructureto work optimally in the cloud is subject for other instalments of this blog. Beware: Now everyone who has access to your Firefox can do almost everything with your instances and S3 files! Ensure your Firefox password store is protected by a Passphrase! Any you might want to be careful where you install these tools. Don't install it on a notebook without Full Disk Encryption or File Vault - if it's a Mac. In case you leave your workplace, activate your screen blanker and configure it to require your login credentials. Monday, March 9. 2009Managing Amazon EC2 - Networking In The Cloud
Network details for virtual servers in Amazon's Elastic Cloud are somewhat different from your usual data centre. For reliable services, one has to know these peculiarities.
Host names and IP addresses Every instance you start in Amazon's Elastic Cloud has effectively two IP addresses and DNS names: one for internal use -- between your virtual servers-- and one IP address and DNS name which is accessible from the outside -- if you so want (that's where Security Groups kick in). Everyone of your instances has two hostnames, for instance: ec2-75-101-222-198.compute-1.amazonaws.com and domU-12-31-39-03-68-22.compute-1.internal The first one is bound to one of Amazon's official IP addresses, thus the machine is accessible from the internet under this hostname. The second hostname instead applies only for internal traffic that stays in the Cloud. Which is – for instance – traffic between your own instances. Effectively every instance sits behind some kind of NAT gateway, thus an "ifconfig" issued locally in one of your instances only gives you the instance's RFC address, like 10.254.33.44. This is actually a nice concept as this NATting automatically works as a packet filter, not letting anything through to your instance you haven't allowed explicitly by assigning Security Groups to your instances. Both of the instance's addresses and DNS names are assigned dynamically during the instance's creation and neither can be pre-determined. And once you exchange your existing VM with a new one serving the same purpose, you will be assigned new addresses. Not really suitable for providing services over the internet? Elastic IPs Right. But these addresses aren't meant to be used as an outside interface at all. That's what the Elastic IP is made for: You can reserve up to five IP addresses to your account which become "yours" until you manually release them. And each of these Elastic IPs can be assigned to a particular instance. Which is actually a clever thing. You fire up an instance and assign the Elastic IP to it. As the Elastic IP is "yours", you can assign a DNS name to this IP address (via any DNS provider) and the virtual machine which this address is assigned to can then act as e.g. your web server. If you prepared another instance with your new website content, you just re-assign the Elastic IP to this machine and your new website is online. Elastic IPs are restricted to one IP per instance. Means you cannot assign two Elastic IPs to one instance. You might want to consider this for your planned design. This is actually almost like the good old "heartbeat" for high availability: have two machines ready and assign the Elastic IP to one of them. If there's problems with the active instance, just switch the Elastic IP over to the second one and all is well again (OK, even better is it to have load balancers sitting in front of your real servers, but we want to keep it simple here). And with external monitoring of your server's health, you can as well script the switch-over by the api-tools provided by amazon. But beware: never ever release an Elastic IP unless you are really sure you don't need it anymore. Once it's released, it goes back into Amazon's pool and even if you immediately reserve an Elastic IP again, you will most likely get a different one. And as DNS TTLs are usually set to several hours or even a day, you'll cause a pretty big downtime with only one wrong mouseclick Beware again: An instance's public DNS hostname (the one assigned by Amazon) -- as well as the IP address associated with it -- will definitely change when you assign an Elastic IP to this instance! Same happens once you dis-assign an Elastic IP from an instance. Amazon doesn't waste IP addresses and once your instance is assigned an Elastic IP, the previously (by amazon) assigned IP address is being dis-assigned and goes back into Amazon's address pool. If you re-assign your Elastic IP to another instance later on, the instance gets assigned one IP address from Amazon's pool for it still being able to be accessed from the internet. But this address is completely unpredictable (other than that it is part of one of Amazon's net blocks) and its former pool address is most likely already assigned to someone else. So every time you want to log into an instance, you first should look up its current IP address via Elasticfox, Amazon's API tools or just on the AWS Web Console. Else you might wonder why you cannot log in. Security Groups -- Your EC2 Firewall These security groups are in fact some kind of packet filters who define which network ports are accessible from outside the instance. They not only work for connections from the internet, these rules also apply for internal traffic – between EC2 instances. These Security Groups are basically packages of "firewall" rules, like "accept ssh from 123.123.123.123", combined into Packages -- the "Security Groups" -- which you assign to your instances for providing protection against unwanted access. No Changing Assignments Security Groups are being assigned to an instance at creating it. Once the instance is running, you cannot assign more groups nor can you remove groups from a running instance. You can only change individual rules in security groups. Organising Rules But what you still can do at runtime is deleting and creating rules belonging to an active security group. Hence it is a good idea to arrange your security in a way to have the least effort managing them. Recommended is one -- or a set of -- standard rules, like ssh access to your instances from different IP addresses. This would be a rule which you would need for every instance anyway. There's other rules which are needed for (almost) every of your instances. As those rule rarely change, You might collect them into some kind of standard group, to be assigned to every instance, of a particular purpose. A mail server might want to have port 25 open, a webserver port 80. Create groups accordingly for any type of server. And as you can add rules to an existing group or delete some from it, you can correct some missing or incorrect rules later on. But please keep in mind that changes to rule affect ALL instances the corresponding groups have been assigned to. Connections from the outside can be "routed" through an Elastic IP for having a dedicated gateway for incoming connections and you can define which ports are open fron which source addresses. But outgoing connections, originating from your instances aren't blocked by the Security groups. Outgoing connections are allowed by default. (Update: the following might not be correct) And you cannot make your own connections have the Elastic IP as source address. This won't be of much interest to most of you, but if you intend to limit access to some servers outside the EC2, you should think of another way to authorise them as using source addresses. While you could use the external hostname for it, everytime you start a new instance, you have to adjust the rules on the remote server -- which becomes cumbersome once you're running more than two instances.
(Page 1 of 4, totaling 49 entries)
» next page
|
QuicksearchArchivesSyndicate This BlogTop Referrerswww.google.com (13)
www.google.de (7) www.google.co.uk (6) www.google.ch (2) www.google.co.in (2) www.google.at (1) www.google.com.sg (1) www.google.fr (1) www.google.se (1) |
