Keynote presentation for the Global AI Bootcamp in Singapore, delivered 15-Dec-2018 at Microsoft Singapore.
This content is an evolution of the presentation delivered during the IMDA Singapore Media Festival in December 2018.
19. User ID Content ID Consumption
Timestamp
alexjs Video0001 1970 -01 -01 0000Z
alexjs Video0002 1970 -01 -01 0000Z
alexjs Video0003 1970 -01 -01 0000Z
20. User ID Content ID Consumption
Timestamp
alexjs Video0001 2018 -12 -02 2251
alexjs Video0002 2018 -02-04 1950
alexjs Video0003 2018 -02-04 2030
Content
Name
Content ID Genre
The Bee Gees Biography Video0001 Drama
Mindhorn Video0002
Waking Ned Video0003 Comedy
Comedy
21. User ID Content ID Consumption
Timestamp
alexjs Video0001 2018 -12 -02 2251
alexjs Video0002 2018 -02-04 1950
alexjs Video0003 2018 -02-04 2030
Content
Name
Content ID Genre
The Bee Gees Biography Video0001 Drama
Mindhorn Video0002
Waking Ned Video0003 Comedy
Comedy
people centric
60. Fairness
AI systems should treat all
people fairly
Reliability & Safety
AI systems should perform
reliably and safely
Privacy & Security
AI systems should be secure
and respect privacy
Inclusivity
AI systems should empower
everyone and engage
people
Transparency
AI systems should be
understandable
Accountability
AI systems should have
algorithmic accountability
64. Human Parity: Chinese/English
Project Brainwave (Real-time AI)
Unified Speech Service
- Speech-to-Text
- Text-to-Speech
- Translation
- Analytics
https://alexjs.co/thaidemo
65. Human Parity: Chinese/English
Project Brainwave (Real-time AI)
Unified Speech Service
Computer Vision
- Docker support
- Boxing and Isolation
- Improvements in OCR
66. Human Parity: Chinese/English
Project Brainwave (Real-time AI)
Unified Speech Service
Computer Vision
Hummingbird
- AI augmented news
- Not just what
- How
- When
- Trustable / Verified
71. Use AI to help empower people to be better.
Delight your users.
responsibly
Editor's Notes
Many customers ask me – “with all these advances in technology, do I stay relevant? Am I going to be replaced?”
And the answer is simple – it’s not about whether you do or not, but finding out how you stay relevant.
Because AI isn’t about replacing individuals, not at all
AI is a tool. A tool we can all use, to empower every person on the planet to achieve more.
But what can we achieve? What are we looking for?
At the World Economic Forum, they presented a timeline of AI expectations. As far as 2053, the expectation is that AI will be advanced enough to replace surgeons
Some of these milestones are already being reached – with Language Translation – specifically Chinese to English – already being at parity with humans, as announced by Microsoft earlier in 2018
But how do we make sure that we drive AI into having that spark – that unique injection of sentience, and stop it….
from being an automaton that replaces individual preprogrammed behaviours
A big part of that is really knowing what AI is.
AI is about giving computers the ability to have the cognitive capabilities of humans. This is enabled through ML – which I treat as a subset of AI – which is around allowing computers to *learn by themselves*, not just follow data we directly feed.
But let’s benchmark and understand where we are with AI today
In the last few years, AI has been good at making decisions that take the average human under a second. For instance – is this a hotdog?
It’s advanced enough to differentiate – making sure models haven’t been mistrained and give bad examples
And give an example of its accuracy with which it made the decision. Importantly here – AI can also look at a composite image and isolate the most prominent feature, much like a human/
Today, AI is capable of taking far more data than a single human could parse and understand in a short time, and using that to guide automated decisions. A great example is Rolls Royce – who through the use of IoT and AI, are able to make decisions not just about what is happening in one engine across hundreds or thousands of sensors, but using data being generated globally from *every* engine, to see whether the results on a sensor in one engine, are out of tolerances being shown across the entire fleet.
Now what’s missing, and what comes in the future, is the creativity for original creation
But with all this – AI is nothing with data to fuel it.
Having said that…
…it has to be the right data. If the wrong data, we risk the models being ineffective.
If we think about the older days of data – we typically had limited amounts of data and limited insight which we could draw. If we were very lucky…
…we might have two different tables with a relation. From this – can anyone tell me what the relationship is for the videos watched? Can you tell me a trend in the kind of content being consumed?
Probably not – because the data we see here has been designed and interpreted for commercial cases. It isn’t people, user, or outcome centric data.
So what kind of data could we have?
Let’s take an example video streaming platform. Here there are many types of data that are explicit and obvious to the user
For instance – the video progress itself – how far is a user through a given programme (and how does that affect recommendations)
How about the scrubbing? What can we tell about users when they rewind on a specific scene? Is our content confusing? Is there something they want to see more? Are they looking at a featured product?
What about if we tie that into abandonment and restarts. Do people get to a point in the film and realise they have no idea what’s going on and start again? Do they get to a given scene and go “I don’t want to watch this”, and abandon?
What about some non obvious things? Like the audio language? What if I’m watching a Japanese show, but with Korean audio dubs?
And what if I’m watching that with English subtitles? Three different languages – what could that say? I’m trying to learn spoken Korean?
What about really non-obvious things? Audio volume and the time of day the content is being consumed can tell us a lot – am I at work and watching TV while my boss isn’t paying attention? Am I in a family home and watching with headphones while sitting with my family? How do I consume?
What about the more esoteric pieces. If you operate a platform with multiple sites, you can also get a better picture of how users use your entire platform in cohesion. This gives you the ability to use AI for good – make the site more friendly, accessible, engaging, and better tailored to your customers.
Facial recognition technology is well proven already. This should be used ethnically, but is a consideration – for instance…
I use gravatar for Wordpress comments – what else do I engage with?
Or even – what non-profile information do I share, if I already share an avatar with a site? If I don’t state my age in my profile, but I upload an avatar, I can at least get better tailored content based on my computed demographic.
How about acceptance rates? Post video view, what do I click on? Furthermore, what do I continue through to completion, vs abandon? This is a strong indicator of the quality of a recommender, but also of my kind of content choices.
For anyone familiar with the Internet in the late 90s up to a few years ago – geographic targeting or even content rights restriction has always been a challenge. If you live in a large country, your IP will typically give your ISP’s location – which means I end up getting restaurants targeted at me that are an hour drive away. Now with HTML 5 geo information, however…
The accuracy is much greater. This is also ethically implemented, as users must explicitly grant permission before any of this data is shared. Now the data is much more accurate – as close as 15km
Or in this case, a mere 10 minute walk from the office. Now if I fancy lunch, I can get recommendations far quicker, better tailored, and the result is a happier user.
Just to give the context of the scale of this data. In the 50s, Nielsen moved from just radio ratings into TV. This was a huge advance, and gave better insight into how consumers watched content.
But at this point they still relied on humans – a sample – to write out the TV they watched into a book.
In the 80s, they moved into an automated system using meters that tracked VHF/UHF. This was a huge advancement, but the sampling size was still small
Now – we have a scenario where for an online video platform, we don’t have 40000 households representing a country, but 40000 data points representing a human.
We can use this, to create frictionless engagement.
Now that’s a lot of data – how do we find meaning from it?
Back in 1996, Microsoft released Microsoft Comic chat. This was one of the first pieces of automatically generated content. It ran over the IRC protocol and gave a fun, comic interface to short form messages.
Now we have the ability to automatically create content from written text. Startups like Wibbitz create short form video from articles. This creates a society of inclusivity, as those who previously couldn’t have consumed the content can now engage with it in new and interesting ways.
Many of us forget how multi-racial and cultural almost every country in the world can be.
Take an example of a video – a user watches, but 5m32s in, they stop watching. Why is that the case? Normal video platform abandonment is around 6.3-6.4%, so it’s easy to ignore the outliers. If we try to build a graph of all this, it can look very complex
But with the advancements in AI and machine learning, we can now build a pattern of relationships between the individuals
And we can identify that for a given set of users – the content was particularly objectionable or offensive in a given scene. This insight helps us in many ways.
We can recommend different content to that demographic of user
We can change our marketing or content operations practices to avoid offence
Or more importantly, we can iterate and create content which is inclusive for all to enjoy
A common problem faced by broadcasters is huge amounts of non-digital media, often on tapes, sitting in warehouses. Many groups want to use that archival content for things like news stories, or tributes and obituaries. This has been incredibly challenging.
However, now with advances not just in AI but content digitisation, it has never been easier to digitally encode all content. Furthermore, with the use of Facial Recognition technologies, companies can easily find all clips that feature given individuals – to make content more relatable, relevant and engaging.
It’s no secret that print advertising is a declining business world wide (with some regional exceptions). But advertising itself is bigger than ever before. The problem, however, is adverts that are irrelevant. I live in Singapore and have no interest in buying a car, but many people have or need one.
Now with better geotargeting and customer profile design, advertising doesn’t have to be guess work. We can use data to give the right content to the right people at the right time, all designed to better improve their experience.
This can be made even more real time through intelligent digital outreach – for instance through signboards which adapt content based on the average demographic of users around it. During commuting time – services which are tailored to the average working professional. During school time, brands and content relevant to younger people, and so on.
A big trend in the industry is to talk about things being “at the edge”
Almost every use case can be dealt with by the Cloud, however: there are some “edge” cases.
One is where the data location itself is semi disconnected.
Another is where the data needs sub-second latency.
Another, which I often dispute, is security. Some examples here are where the data is required to be air-gapped from an Internet connection.
Nonetheless, there are many ways to enable these use cases. For instance, through using Analytics on an IoT edge to give better insight into machine maintenance at a factory.
Or through the use of a Vision AI platform which can handle facial-recognition at the edge for extra low latency
And as you may have spotted – Cognitive Services from Azure can now be deployed into Docker containers, giving you the portability and flexibility to control how you use this innovation in AI.
We have also provided, on GitHub, an AI toolkit for IoT Edge – with samples such as the use of the ISIC Skin Cancer dataset to classify and interpret images for markers for skin cancers.
But with all of this data, we need to make sure we act ethically.
Microsoft designs all AI services, and publishes this to our customers and partners, to be ethical. AI must be fair and treat all fairly. It should be transparent in the way it does this, and accountable for the decisions it makes. It should be inclusive for all people, regardless, and it should be reliable in the way it treats them. But most importantly, AI must be secure and respectful of privacy.
There is no compromise on any one of these.
So with this, what’s new from Microsoft?
As mentioned earlier, based on the newstest2017 data set, we achieved Chinese to English translation parity earlier in 2018. This follows Speech-to-Text parity which was reached in May 2017 – in this case showing an error rate of around 5.9%, which matches professional human transcribers, on the Switchboard dataset.
Project Brainwave brings FPGA backed real-time AI to the edge – leveraging the innovation of Cloud but directly in your sites.
At Build we announced the new Unified Speech Service, bringing many of our capabilities together in a single set of APIs. I put together a site in a few hours that allowed me to do Speech to Text and Text to Speech in Thai, which was fantastic. Even better is the support in the Speech-to-Text service for many South East Asian languages, such as Bahasa Melayu. This allows people to experience the democratisation of AI, all using a set of re-usable APIs.
New innovations in the Computer Vision service include better boxing and isolation to deal with complex images, iterations in the quality of Optical Character Recognition, and as mentioned earlier – the addition of Docker support, allowing you to deploy this innovation into Containers too.
In Mid December, Microsoft released Hummingbird, a new AI-augmented news platform. This helps tailor content to users based on a wide array of inputs, and most importantly, and ethically consciously, it provides news which you can trust.
So in summary, we must always remember that…
AI is nothing without data – but it must be the right data. The data must treat everyone fairly, inclusively, and ethically.
But that data isn’t always what is obvious. Operationally we all gather many kinds of data, let’s see how we can use that to improve the lives of our customers.
We must always use AI to help empower people, and organisations, to be better. Always design to delight your users, when you use data and AI.
But always, always use it responsibly.
With that, thank you for your time, and now please: go do great things.