We’ve begun an initiative at Citrix to make software development inherently more secure. I’ll start with a few security anecdotes, give you a walkthrough of the security layers from data to physical, and highlight security features along the way. I’ll also discuss the Helix Versioning Engine protocol and show you why SSL encryption should be on by default.
1. Securing the Helix Platform
at Citrix
Jason Leonard
Staff Software Engineer, SCM Team
2. 2
Jason Leonard (jason.leonard@citrix.com)
15 years dealing with Perforce.
Adobe (5 years)
Citrix (10 years)
Staff Software Engineer
Source Control Team
2 team members and a manager
~20 servers, ~40 Helix repositories
~3.5TB version data
growing 300GB per year.
~250,000 commands per hour
~6M commands per day
Not a security
engineer
7. 7
Data Security
Redundancy
• Ensure your data can cope with some hardware failure
• Can increase performance
On Disk Encryption
• Disks can be stolen, or end up in the wrong hands
• But we incur a performance penalty
Backup, Backup, Backup
– If its not in three places it doesn't exist
– But the data is in three places
• TEST IT
Data
Application
Operating System
Network
8. 8
Application Security
Authentication
• Username
• Password
• Or ticket if we have already
authenticated with ‘p4 login’
Authorisation
• Groups
• Protections
Data
Application
Operating System
Network
run.users.authorize = 1
• Otherwise ‘p4 users’ allowed
security = 4
• Strong passwords
• Ticket based login required
• Authenticated service user
9. 9
Authentication
security <= 2
Password based auth
• Command-line
• Environment
• P4CONFIG file
• Windows Registry
Data
Application
Operating System
Network
p4 –u jasonleonard –P mySuperSecretPa55w0rd
P4PASSWD=mySuperSecretPa55w0rd
P4PASSWD=9ed1ae7793942a500012e97c9a605a74
10. 10
Authentication
security >= 3
Ticket based login
• p4 login
• Tickets timeout
• Can lock to client IP
• Can remote invalidate
Data
Application
Operating System
Network
p4 –u jasonleonard login
Enter password: *********
perforce:1666=jasonleonard:c6a65e9365c1f5245….
11. 11
Operating System Security
Software firewall
• Don’t neglect the firewall on your servers
• Windows Firewall, iptables
Anti-virus/malware
• Don’t let your anti-virus scan your metadata
OS Hardening
• Ensure you follow guidelines
• Remove unnecessary software
• Turn off unnecessary OS features
• Ensure each machine runs only one service and runs it well
Data
Application
Operating System
Network
12. 12
Network Security
Firewalls
• Separate production networks
from user networks
VPN
• To access production network
for configuring machines
Intrusion Detection
System
• Log watching
• Honey pot
Data
Application
Operating System
Network
SSL
• Encrypt all traffic over the wire
Wireless
• Disallow any wireless network to
your source control
DNS/DHCP
• Prevent the man-in-the-middle
attacks
13. 13
Two way RPC Protocol
Remote Procedure Call
p4 login = user-login
function
client-Prompt displays
• Enter password:
dm-Login contains the
salted password
client-SetPassword
contains our ticket
14. 14
Secure Helix Communications
Available since 2012.1
Authenticates end point
Encrypts traffic
Server
• Generate a certificate on the
master/broker/proxy
• Run with –p ssl::1666
Client
• P4PORT = ssl:host:1666
• Accepts the certificate with p4 trust
C:>p4 -p perforce:1666 trust
The fingerprint of the server of your P4PORT setting
'ssl:perforce:1666' (10.0.0.1:1666) is not known.
That fingerprint is
89:8E:FD:55:42:A5:D8:DC:C2:9F:33:7C:B4:AD:C9:4B:3E:22:34:9D
Are you sure you want to establish trust (yes/no)?
ssl:
15. 15
Annotate Bug (#74317)
Found by a Citrix developer
• Attempting to write some
automation
Large block of “random”
data seen with ‘p4 annotate’
Text file with one line longer
than 10,000 characters.
“random” is actually parts of
p4d’s memory
• Usually database structures
Patched in
• 2015.2 Patch 3
• 2015.1 Patch 13
• 2014.2 Patch 12
• 2014.1 Patch 20
16. 16
Physical Security
Server Room/Lab
• Door security, key, swipe?
• Access policy, who can open the door?
Racks
• Locked by key, combination
Servers
• Case intrusion prevention
• Disk drives locked
Disposal
Data
Application
Operating System
Network
(CLICK) Hi I'm Jason Leonard and I work in the Source Control team at Citrix.
(CLICK) I've been using and administrating Perforce; now Helix, for 15 years ever since I started my first job at Adobe straight out of university.
(CLICK) The team is quite small and includes my team mate Erin and our manager both in our Fort Lauderdale campus in Florida, and I work in our Cambridge office in the UK.
(CLICK) We look after some 20 odd servers and run nearly 40 Helix repositories worldwide.
(CLICK) We are holding about 3.5TB of data growing at around 300GB per year.
(CLICK) Helix handles for us about 250 thousand commands per hour which is just short of 6M commands a day.
(CLICK) I'm not a security engineer!
So why am I talking about security?
At Citrix we have, in the last couple of years been going through some internal security "attitude re-adjustments".
This presentation is a collection of some interesting things I've learnt on the journey to being more secure.
I just wanted to start with a couple of my favourite security items I’ve heard about fairly recently.
Two German researchers published a paper called "On Convert Acoustical Mesh Networks in Air“. They proposed a way to create covert communication channels by making use of sound hardware. Sound hardware is generally only thought of as communication with the direct user. But what if this hardware could be used by multiple computers within close proximity to form a covert computer network? Most sound hardware can produce frequencies out of human hearing, operating systems haven't really focused on securing these devices. The paper goes on to describes that they managed to implement such a system using commercially available business laptops, and a protocol originally designed for underwater communication that can cope with acoustic interference.
An article that was written about the paper says this...
The most secure computers in the world are completely isolated from other machines, protected by "air gaps," with no Internet connection, no shared phone lines, nothing. Conventional wisdom goes that such computers are impossible to hack unless the hacker has direct, physical access to the machine.
Well that was until now…
Audio has of course been used in the past for digital data, modems, tone and pulse dials on your phone, most consoles of the 80's stored programs on cassettes. Actually today's WIFI isn't all that different, it just operates at a much higher frequency. In my opinion the best hacks take hardware and its purpose, completely undo the meaning of it to elegantly create something new. Or put another way, it doesn't matter what something is designed to do, it's what it can do that counts.
A few months ago I heard about an employee of Blackberry who managed to hack a device that administers morphine to a patient. He managed to extract the wi-fi keys and re-flash the device so that he could remotely administer a lethal dose of the drug and print a message on the screen of the device saying DEAD. Pretty scary stuff.
Just so you know there are 400,000 of these devices around the globe and they apparently all use the same wifi keys.
So on that note,
I’d like you to consider your own Helix implementation for a moment.
Do you think it’s Secure? Or maybe Insecure?
Is your Helix completely disconnected from the outside world and obviously with it’s sound hardware disabled or removed, or is it connected to a morphine drip awaiting it’s remote re-program and imminent death.
Audience participation
Raise hand if you think you have a secure Helix
OK.
So I hope you were all paying attention, now we know who doesn’t have a secure Helix implementation.
(CLICK)
That was a little underhanded. However it does show that we need to be careful when approaching and answering questions of a security nature. It’s caught me out couple of times, once to the point where I had the local media turn up at my door with a video camera wanting an interview…. Yes… if you would like to know more about what happened there, talk to me after.
So we in the software industry love layers and when it comes to security I like to think of security in layers. I’m going to go through these layers and point out things of interest that I’ve encountered whilst implementing some of our security.
What is it we are protecting?
When it comes to being a source control administrator what we are protecting should be fairly obvious, its our data. The data we store in our source control systems is almost certainly confidential business information. If I lose it, we are in trouble, if it's stolen we are going to have to answer a whole load of questions. If we keep it safe and available for the right people to use, then we are all happy.
The Helix version engine has a database, and a set of version files. The combination of these two things forms a repository. Let's assume for the moment that our repository is stored on disk. How is the data stored on disk? Well this depends on whether we have thought about disk failure and performance. No matter what sort of disk we have, there will be the odd failure. So we probably want to ensure that if we get one we don't lose anything. We want recovery times to be 0 or as close as we can get to 0. So we have probably used RAID or equivalent technology to provide the redundancy and maybe some performance.
(CLICK)
Now most server hardware vendors will have RAID on board, and provide a drive replacement service. Your drive fails then a replacement turns up at your datacentre, but remember it’s a drive "replacement" not a drive "giveaway“ so you have to give back the failed drive. What are you about to give to hand back to your provider and never see again? Yep you guessed it, your data. Ok maybe not all of it, but a portion of it. So how do we prevent this loss?
(CLICK)
Well there are two possibilities, both again come with a con. We either encrypt the data on the disk thus incurring a performance penalty or we could pay the vendor even more money so we can keep the disk, but now we have to also pay to have the disk destroyed as well.
(CLICK)
To ensure we never lose our data we are obviously going to need to back up the data.. What ….warm standby… offsite….that one for luck? But hold on, our sensitive asset has now become multiple sensitive assets spread across countless drives and probably held by multiple teams.
Lets take a look at the application layer.
Applications interact with data directly
One would hope that whoever creates your application has thought about security. Fortunately the good people from Perforce haven't left us in the lurch. We have many bells and whistles we can tweak, some I want to cover in the following slides, others I'll leave you to go and investigate.
(CLICK)
For anyone who has used Helix, they understand that it keeps track of your activities by tracking you as a user. So our first job is to resolve who we are by authentication. For that we use our username and password, or ticket if we have already logged in with p4 login.
(CLICK)
Your user is a member of groups, each group and or user has permissions from Protect table
Gonna assume you all know about these.
(CLICK)
Now I don’t know if you knew this, but there are a few commands you can run without any authentication. ‘p4 login’ is the obvious one, but did you know that ‘p4 users’ is too. So lets make sure we set ‘run.users.authorise’ to prevent this, otherwise we are just handing those pesky hackers more information than they deserve.
(CLICK)
Over time the versioning engine has evolved it’s security, and in order to activate more secure features we have another configurable. It’s called the security level. Since we are talking security I’m going to recommend you set this to the highest available for the server release you are using.
But what if we didn’t?
Lets just investigate this for a sec…
(NEXT SLIDE)
assume security level set to 1 or 2
This means we must use a password, and thus the use of password based authentication
our administrator has sent us the username and password to our account.
So we provided this to the versioning engine whenever we execute a command on commandline or the env, in a config file, or in the registry.
(CLICK)
What’s wrong?
Yep, clear text passwords
Probably stored long term in scripts, bashrc, environment, registry, etc..
Bad.
Ah.. You say, but you could create a hash of your password
(CLICK)
Yes, you can md5 hash your password, but isn’t this just another password?
Ok so you’ve obscured your password, congratulations, the only good you have done here is if you reuse your password on other systems nobody could use it now to login to those other systems. (nobody reuses passwords for multiple systems… right?)
(CLICK)
Password based authentication isn’t good
Lets raise the security level to level 3 or above.
This disables password based authentication so we use ticket based authentication instead.
How do we get access to the versioning engine now?
(CLICK)
Well we execute p4 login to generate a ticket
Yes, but what’s IS a ticket? I hear you ask…
Now I’ve pieced this information from a few sources, so someone from Perforce may want to chip in here to correct me…
Tickets formed from
The IP field in your server license
Username
Password
Timeout (group changeable)
(optionally) your Client IP
So tickets are better than passwords right? Well…. Sort of.
The part we must pay attention to here is the optional client IP part.
If you decide to use the truly hidieous ‘login –a’ it will generate you a client IP agnostic ticket. Basically what you get back is a time bound password. Yep, time bound password… If we have the user in a group that has ticket timeout set to unlimited… then is gets worse… We’ve just been given back ‘a password’.
So my recommendation is that we need to ensure our timeout is as short as possible, and Client IP’s are used …. ALWAYS!
Lets move on to the operating system
(CLICK)
Make sure your software firewalls are on and configured to only allow the traffic that is required.
I once took over some servers from a group and they all had their firewalls turned off.
When I asked why, I got two replies,
--- It’s one of the first things we turn off cos it gets in the way…….
And -----I don’t know, it’s what we always do…..
Firewalls are supposed get in the way its there job to do so…. And don’t be a headless chicken, use your own brain it’s probably better than someone else’s…
(CLICK)
The times I’ve had engineers come to me and say that corporate have forced them to install anti-virus on their computer and it’s slowing down their compiling process.
Anti-virus is obviously important, but it can get in the way… For Helix don’t let it scan your metadata! I’m sure most of you are aware of this. It is a performance killer.
(CLICK)
Harden your operating system.
This one sounds really simple, but most people don’t do it. It’s boring, it might break something. But there are documents available to harden your operating system. Go and spend the time reading them. I’m sure you will learn a thing or too.
It’s simple things like uninstalling programs you don’t need or turning off OS features.
Separate your services into multiple machines so that each machine runs one service and runs it well, hypervisors are great for this and you get all the added virtualization bonuses like snapshots and high availability.
The Network layer
In the last couple of years Citrix has worked to secure some of our build infrastructure.
This is actually where I started heeding some of the advice that was being given from our security teams.
(CLICK)
They separated the build environment into multiple networks and configured a hardware firewall to not allow any incoming connections from other corporate networks. This way if a software firewall was turned off on the machine for whatever reason there is still a layer of protection.
I had someone recently ask if they could turn off the firewall on one of my production Helix servers to test the backup system. No I said, go test it on something else.
From a Helix perspective we would obviously need to open the network ports to the versioning engine. But only for Helix services, nothing else.
But what happens when I need to configure my servers I hear you ask, remote in, RDP, SSH, or whatever.
(CLICK)
The idea was to use a VPN to connect to the internal network from a separate machine to my normal day to day machine. By having this separate machine we create an air gap and ensure that any data we transferred across the air gap get checked before being used.
Lets hope we don’t have any of that acoustical sound card malware installed anywhere nearby.
(CLICK)
One of the interesting things about this approach is that from the users network side we can monitor the remote access ports that an administrator would normally connect to and create a honey pot. Any user connecting to these ports would then get quizzed by our security team later. Something which I’ve heard isn’t good.
(CLICK)
We have a policy that all business confidential information must be transmitted over the wire encrypted, you never know who is listening.
(CLICK)
Wireless networks…
Don’t allow a passer by to steel your data.
We deny all wireless access to our Helix servers, which does annoy the odd laptop user, who then connects a wire. But at least our data isn’t broadcast into the ether, cos you never know who IS listening.
(CLICK)
DNS
Anyone watch the key-signing-key ceremony a few months ago? This is the ceremony where 25 people are locked into a room for 8 hours whilst they create new private/public key pair for the DNS root servers.
There is a reason so much time is spent on securing DNS, we need to make sure that the ones we use for Helix are secure too.
Now I wanted to talk a bit about the Helix versioning engine protocol itself.
I don’t know whether anyone here has had any interest in it, I’m fascinated by this sort of stuff.
It’s a two way remote procedure call protocol. The client can call functions on the server, and the server can call functions on the client.
The command you give to the client is actually the function to call on the server. Once the client calls the function, the server takes control of the session and basically tells the client what to do.
The client has many abilities but generally does what the server tells it to do in the correct order to complete the command.
(CLICK)
The data stream between the client and server is just a bunch of key value pairs. Each time the key “func” is spotted all previous keys are treated as parameters and the value of ‘func’ is the function to execute.
Lets take a look at the p4 login command executing.
(CLICK)
user-login is our command we want to execute.
(CLICK)
client-prompt tells the client to write a string to the screen and ask the user for some input, then tell the server by calling dm-Login.
(CLICK)
dm-Login contains the password we were asked for hashed with a salt value from client-Prompt function.
(CLICK)
client-SetPassword contains the user ticket that the client now needs to store in the tickets file.
That’s it, we’ve just logged in.
The diagram is a simplified version of what actually goes on, there is some other “fluff” shown here.
(CLICK)
The protocol function exchanges config flags, flush seems to be a synchronization point, and release ends the connection.
If you want to see some of this yourself just add “-v 3” to your p4 command line.
Now lets just step back and think for a sec about what we have just seen.
The client passes the command and all parameters to the server. The server takes control at this point. This allows the versioning engine to remain backwards compatible to some extent. But what’s the down side here?
Well the server is in control, so what exactly CAN the server tell the client to do?
Well lets think about some of the commands that versioning engine can execute… help… print stuff to the screen, Sync? Write files to the client, submit… read files… , Sync #0… delete files, p4 resolve…has the ability to run the P4MERGE program.
So hang on, we’ve just handed control of the client to the server.
But it’s ok we trust Helix… Its only going to do what it needed and no more … right?
This is fine, …..but was it a Helix server we were just talking to? Was it?
Proxies and Brokers are interesting…. Man-in-the-middle?
What we need is a way to guarantee that we are talking to Helix infrastructure and not to something someone knocked up in their spare time that looks like a versioning engine.
This is exactly why securing your DNS is critical, imagine if someone modified your DNS server and pointed all your Perforce users to their hacked “delete everything“ server.
Secure Helix communications have been available since 2012.1 but due to the use of OpenSSL it’s worth upgrading to the latest before turning it on.
(CLICK)
Since the server and client exchange certificates the client can verify the server certificate and therefore authenticate it as the correct end point.
We also benefit from encrypted traffic over the connection.
(CLICK)
In order to configure we need to generate a certificate on the server end, and of course make our server use SSL.
Then we configure the client and trust the certificate with p4 trust.
There are a few issues that may slow or prevent the roll out.
First is that there is a change to the format of P4PORT. Note the SSL: part at the beginning. This could very well trip up some automation tools and most users just won’t get it.
(CLICK)
Second you must trust the certificate with p4 trust before the client will talk. But the only thing you are presented with is a long nasty fingerprint. How many times has your browser warned you about a dodgy SSL connection and you’ve just proceeded. Users just don’t check the fingerprint.
There are a couple of things I’d like to see change here.
First, make it so that the connection can be upgraded to SSL. This will mean that you don’t have to specify SSL:. (CLICK) The LDAP protocol already does something like this with its “StartTLS” operation.
Second, I’d like to see a chain of trust with the certificates so that we can trust a long term root certificate authority by default (just like browsers do) and then generate short term CA signed certificates on our infrastructure. We can change the infrastructure certificates as regularly as we like and the users will just automatically trust the new ones. I can also publish the long term thumbprint of the root certificate authority something users will get used to seeing. Maybe to make this easy a ‘p4 trust’ operation should actually trust the long term root certificate instead, then all I need to do is keep the root certificate authority private key super safe and only break it out when I want new infrastructure certificates.
Now, I just wanted to digress a bit and tell you about this bug we found in p4d which initially seemed just odd, but turned out to be something far more serious.
(CLICK)
One of our developers logged a ticket with my team saying that Perforce was breaking his automation. He gave me an annotate command that could reliably reproduce a block of seeming random file corruption. When run as a print command the file was returned correctly so I knew that we were looking at something odd about the annotate command and not the actual version data.
(CLICK)
I noticed that the line before the corruption was unusually long, infact the line was 12,054 characters. Immediately I realised I’m probably seeing the affect of a buffer overrun. I conducted an experiment to determine the line length required to trigger the effect. It turned out to be exactly 10,000 characters, which to me feels like a number that a programmer might choose for a buffer.
(CLICK)
Next I analysed the random data corruption and found that if I executed the same command multiple times the data changed and sometimes words would appear. Words which I recognised as being from the metadata, this meant that the random data was in fact memory from the p4d program and could indeed contain database structures.
At this point I handed the information to Perforce and allowed them to conduct their own tests. This happened to be just before Christmas and I knew I probably wasn’t going to get fast turn around, how wrong I was. I reported the incident on the 21st of December, According to my e-mail records it took a total of 8 days for a fix to appear in the latest version of p4d on the FTP site 2 of which were Christmas and Boxing day, and 2 were a weekend.
So I just wanted to say, a big thank you to the amazing support department at Perforce.
(CLICK)
The good news is that so long as you are running a version of p4d equal to or later than the ones show here, then you are golden. I guess the bad news is that if you aren’t, then I’ve just disclosed a vulnerability in your server…. Sorry about that!
Whilst this isn’t one of our layers I did want to point out some physical security bits we have been thinking about at Citrix.
(CLICK)
So our security team (that I mentioned at the beginning) wanted a rack that was SOO heavy that they required the first floor lab’s floor to be reinforced. The landlord refused so they had to use a different type of rack. But this does give you an idea on the lengths these guys go to. Still Physical protection is a good thing. Even if it’s just to deter itchy fingers.
(CLICK)
Make sure you can’t get into your lab with out a key or pass. Make sure not just anyone with a key or pass can get in.
Make sure you lock your racks. Most come with keys, but the keys are usually a standard key that fits all. You need to speak to the manufacturers to get unique versions.
(CLICK)
Pay attention to alerts from your servers. They have case intrusion switches which can be monitored via SNMP.
(CLICK)
Take care when disposing of your machines.
We have policies about removing hard drives and RAM from machines that get put in the bin. They then get ground up by a company that brings a machine round in a van. Destroying RAM is a little extreme, but it’s just a precaution.
I wanted to also include monitoring…
Traditionally we have been fairly poor at monitoring our servers. However I'd like to think we have turned a bit of a corner in recent months, we are getting better. And I'd just like to highlight a few noteworthy tools.
(CLICK)
For infrastructure monitoring, we use Nagios. I think most people do or at least that’s what I hear most people say. We are currently using Nagios XI which provides a much richer experience and end the need for the reems of config lines … eeuuu.
(CLICK)
For log monitoring we use Nagios Log Server. I love it.
Had some failed attempts at GrayLog2, and Splunk is just soooo expensive. But for 2 grand which includes first year of support and as much traffic as you can pump into one machine it’s a bargin! and I recommend you check it out. Our log server runs at around 40% CPU over 8 vCPU’s, 16 GB RAM and is easily capable of processing our 250,000 commands an hour. We’ve even had it up to 1M commands an hour no sweat. (another story for another time though).
This is a screen shot from our Nagios Log Server taken last Friday.
It shows us every single command going through our global Helix servers and updates in real time. It can show us statistics for any timeframe in the last week, but we usually show just the last hour.
I can see exactly what Joe Blogs is doing, see which servers he is connecting through including the proxy he was using.
So if I get a phone call from him, I can inspect his traffic and provide much better support.
I can also use it for tracking down bad behaviour.
We had a user who doubled the traffic to one of our repositories due to a script we was writing which checked that changes made their way into releases. None of the other users noticed it, so I guess the hardware was coping with it, but we definitely saw it in these charts. This allowed us to question the user and offer assistance with their script.
We can also use it to alerts us for certain conditions, for example if someone uses a dodgy version of P4EXP…
Coupled with Nagios Reactor (something we are experimenting with) you can also do a “If this then that” type workflows.
I didn’t think it would be fair do a talk on security and not plug this for Chris and the guys.
Although we don’t actually use it at the moment it is something I’m pushing for us to get. Perforce have an excellent IP Threat Analysis tool.
I’ve seem some demo’s of it’s capabilities and we’ve even had a free one month report done on some of our data.
It takes the audit logs from the version engine and creates charts not entirely dissimilar to the one there. Showing potential users that are characteristically risky.
I definitely recommend you check it out. The data we got from the system showed some interesting things we could never have figured out on our own.
So we have just spent the last 25 ish mins talking about security and all this is good advice but!!!!
The moment a user syncs code out of Helix …..
(CLICK)
we are screwed!
That data is now all over the company on of all sorts of machines. It’s everywhere. To some extent Helix will tell us where it went (have tables), but if a user copies the data on to a memory key/laptop/emails his mate it’s gone.
We have lost control of the data.
Clearly we can’t stop people from syncing code from Helix
So what the answer?
Citrix can help you!
Our Virtual desktop technology called XenDesktop.
Put everything in a datacentre!
Stop those pesky users from copying the data to their local machine.
XenDesktop ensures your data remains in the datacentre but allows your users to work from a desktop machine or indeed anywhere they choose.
This allows us to keep control of the data.
It also enables technologies that benefit from consolidation. For example, at last years Merge conference NetApp had a great bit of tech for cloning data to machines and essentially performing a really fast sync operation. But it’s difficult to use without your desktops being close to the storage. However with XenDesktop this becomes possible.
You can also work from anywhere with an internet connection, and our awesome HDX technology can deliver to your users the best and richest desktop experience.
So lets try again…
Who thinks they have a secure Helix system
(CLICK)
Thanks for listening, I hope you found some of this interesting.
Happy to take questions if we have time left.
If not then please catch me for a chat, I’d love to hear about your security experiences.