SlideShare uma empresa Scribd logo
1 de 46
Reversing Data Formats:
What Data Can Reveal
by Anton Dorfman
ZeroNights 0x03, Moscow, 2013
About me
•
•
•
•
•

Fan of & Fun with Assembly language
Researcher
Scientist
Teach Reverse Engineering since 2001
Candidate of technical science
Observed topics
•
•
•
•
•
•

Samples
Basics
Thoughts
Fields
Related works
Applications
Samples
What is it?
What is it? Hint 1
What is it? Hint 2
What is it?
What is it?
What is it? Hint 1
What is it? Hint 2
Basics
Standard way
•
•
•
•

Hex editor
Researcher
Brain (equally important as the Hex editor)
Basic knowledge how data can be organized
(in brain)
• Analysis of the executable file that
manipulates with data format
3 ways of researching
• Dynamic analysis – use reverse engineering of
application that manipulate with data format
• Static analysis – try to extract header,
structures, fields and try to find relationship
between data
• Statistic analysis – when have some amount
of samples and use statistics of changes and
ranges of values
Breaking the rules
• There is the rule RTFM (Read The F**king
Manual)
• Nobody likes it
• I’m not exception
• First of all I start my research, and second – try
to find related works and generalize ideas
from them
Definitions
• Field – some value used to describe Data
Format
• Structure – way for organizing various fields
• Data – information for representing which
Data Format is developed
• Header – common structure before data, may
contain substructures
Thoughts
Main Idea
• Data format is developed by human (not pets or
aliens)
• Sometimes looks like that no human works on it…
• Format developers use common data
organization concepts and similar thoughts when
creating new data formats
• If we find regularities in data format organization
rules we can automate searching of them
Three types of data format
• File formats – multimedia formats, database
formats, internal formats for exchanging
between program components and etc.
• Protocols – network protocols, hardware
device interaction protocols, protocols of
interaction between driver and user space
application and etc.
• Structures in memory – OS structures,
application structures and etc.
Levels of abstraction
•
•
•
•
•
•
•

Bit Order
Byte Order
Fields Size
Field Basic Type
Field Type
Structure
Field Semantics
Tasks to solve
•
•
•
•
•
•
•

Extract header, separate it from data
Find field boundaries
Find structures and substructures
Find types of fields
Detect bit and byte ordering
Determine semantics of fields
Interpret goal of field – it’s semantics
Bit order
• There are most significant bit – MSB and least
significant bit - LSB

• MSB is a left-most bit – commonly used – value is
10010101b – 95h - 149
• LSB is a left-most bit - value is 10101001b – A9h 169
Byte order
• Little-endian – Intel byte order
• Big-endian – network byte order

• Middle-endian or mixed-endian – mixed byte order –
when length of value more then default processor
machine word
Fields
Types of fields
• Service fields – for describing Data Format
(size of structure and etc.)
• Common fields – “fields from life” (time, date
and etc.)
• Specific fields – we can find range for that
type (bit flags and etc.)
• Application specific – can be interpret only by
application, we should use dynamic analysis
Field Size and Levels of field
interpretation
• Commonly field is a byte sequence
• Sometimes field is a bit sequence
•
•
•
•

Fixed size in bytes (1, 2, 3, 4, …)
Fixed size in bits (1, 2, 3, 4, …)
Variable size in bytes
Variable size in bits
Basic field types
•
•
•
•
•
•

Hex value – by default
Decimal value
Character value (up to 4 symbols)
String (ASCII or Unicode)
Float value
Bits value
Types of field
•
•
•
•
•
•
•
•
•

Text
Offset
Size
Quantity
ID
Flag
Counter
DateTime
Protect
Text
• Usually fixed size, remaining space after text
filled with 0
• Variable sized text ended with 0, or there are
size of this text field
• Usually text string contain their meanings
Offset
• Offset to another field
• Offset to data
• Pointer to another structure (may be child or
parent)
• Pointer to another instance of such structure
(next or previous in linked list)
• Can be relative and absolute
Export in PE-EXE
Module ImageBase

IMAGE_NT_HEADERS
+ImageBase

Signature DWORD

IMAGE_DOS_HEADER

IMAGE_EXPORT_DIRECTORY
+ImageBase

...

...

FileHeader
IMAGE_FILE_HEADER

+01Ch

AddressOfFunctions

e_lfanew DWORD
+03Ch

OptionalHeader
IMAGE_OPTIONAL_HEADER

+020h

AddressOfNames

IMAGE_DIRECTORY_ENTRY
_EXPORT

+024h

AddressOfNameOrdinals

...

+078h

...

Names (Offsets)
...
(Elem size - dword)
Offset FuncName X
Offset FuncName X+1
Offset FuncName X+2
...

+ImageBase
+ImageBase

Name Ordinals
Index

=

Index

Function Names
+ImageBase

...
‘LoadLibraryA’,0
‘LoadLibraryExA’,0
‘LoadLibraryExW’,0
...

...

...
(Elem size - word)
Index FuncAddr X
Index FuncAddr X+1
Index FuncAddr X+2
...

Find

LoadLibraryExA

+ImageBase

Functions Addresses
...
(Elem size - dword)
FuncAddr X
FuncAddr X+1
FuncAddr X+2
...

LoadLibraryExA EP:

mov edi,edi
push ebp
...

+ImageBase
Size
•
•
•
•

Size of data
Size of structure
Size of substructure
Size of field

• Can be not only in bytes, but also in bits,
words (2 byte), double words, paragraphs (16
byte) and etc.
Quantity
• Quantity of substructures
• Quantity of elements
• IMAGE_NT_HEADERS.IMAGE_FILE_HEADER.
NumberOfSections
• IMAGE_EXPORT_DIRECTORY.
NumberOfFunctions
ID
• Identifier of data format, often called magic
number
• ID of structure
• ID of substructure
• Often ID is value consist of char symbols
• Often ID looks like “magic” - BE BA FE CA
Flag
• Bit Flags
• Enum values as a flags
• Special case – Bool value
Counter
• Increment (possibly decrement)
• Starts with 0 / begins with another value
• Changes by 1 or another value
DateTime
•
•
•
•

Storing format
Resolution
Moment of beginning
Range
Protect
• CRC value of data
• CRC value of substructure
• Various Hash functions and etc.
Related works
Projects, articles and authors
• Protocol Informatics Project – based on “Network Protocol Analysis
using Bioinformatics Algorithms” by Marshall A. Beddoe
• Discoverer - “Discoverer: Automatic Protocol Reverse Engineering
from Network Traces” by Weidong Cui, Jayanthkumar Kannan,
Helen J. Wang
• Laika – “Digging For Data Structures” by Anthony Cozzie, Frank
Stratton, Hui Xue, and Samuel T. King
• RolePlayer – “Protocol-Independent Adaptive Replay of Application
Dialog” by Weidong Cuiy, Vern Paxsonz, Nicholas C. Weaverz, Randy
H. Katzy
• Tupni – “Tupni: Automatic Reverse Engineering of Input Formats”
by Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, Luiz
Irun-Briz
Protocol Informatics Project
• Global and local sequence alignment Needleman Wunsch and Smith Waterman
algorithms
Discoverer
Laika
• Using Bayesian unsupervised learning
• For fixed size structures only
Tupni
• Based on taint tracking engine
Applications
•
•
•
•
•
•
•
•

RE any program
RE undocumented/proprietary file formats
RE undocumented/proprietary network protocols
RE undocumented structures in memory
Fuzzing
Examination of protocol implementation
Replay network interaction
Zero-day vulnerability signature generation
Thank you!
• Questions?

• Meet me at researcher@inbox.ru,
• dorfmananton@gmail.com

Mais conteúdo relacionado

Mais procurados

Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalrchbeir
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Abhay Ratnaparkhi
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documentsKriti Khanna
 
Atlas.Ti Summary
Atlas.Ti SummaryAtlas.Ti Summary
Atlas.Ti SummaryPyi Soe
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extractionguest0edcaf
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsAltuna Akalin
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean modelVaibhav Khanna
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibEl Habib NFAOUI
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introductionguest0edcaf
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNNSomnath Banerjee
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...AINL Conferences
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedarcomem
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"Naomi Dushay
 

Mais procurados (20)

Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
co:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidalco:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidal
 
score based ranking of documents
score based ranking of documentsscore based ranking of documents
score based ranking of documents
 
Text mining
Text miningText mining
Text mining
 
Atlas.Ti Summary
Atlas.Ti SummaryAtlas.Ti Summary
Atlas.Ti Summary
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNN
 
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
Ainl 2013 toschev-talanov_практическое применение модели мышления и машинного...
 
Text categorization
Text categorizationText categorization
Text categorization
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
co:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahnco:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahn
 
Galvin Information Security and Library IT
Galvin Information Security and Library ITGalvin Information Security and Library IT
Galvin Information Security and Library IT
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"
 

Semelhante a Anton Dorfman - Reversing data formats what data can reveal

CPP18 - String Parsing
CPP18 - String ParsingCPP18 - String Parsing
CPP18 - String ParsingMichael Heron
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
12 ipt 0202 Organisation methods
12 ipt 0202   Organisation methods12 ipt 0202   Organisation methods
12 ipt 0202 Organisation methodsctedds
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
10-System-ModelingFL22-sketch-19122022-091234am.pptx
10-System-ModelingFL22-sketch-19122022-091234am.pptx10-System-ModelingFL22-sketch-19122022-091234am.pptx
10-System-ModelingFL22-sketch-19122022-091234am.pptxhuzaifaahmed79
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Heimo Hänninen
 
Architectural Styles and Case Studies, Software architecture ,unit–2
Architectural Styles and Case Studies, Software architecture ,unit–2Architectural Styles and Case Studies, Software architecture ,unit–2
Architectural Styles and Case Studies, Software architecture ,unit–2Sudarshan Dhondaley
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr..."PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...Stefan Adam
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 

Semelhante a Anton Dorfman - Reversing data formats what data can reveal (20)

CPP18 - String Parsing
CPP18 - String ParsingCPP18 - String Parsing
CPP18 - String Parsing
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
12 ipt 0202 Organisation methods
12 ipt 0202   Organisation methods12 ipt 0202   Organisation methods
12 ipt 0202 Organisation methods
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
10-System-ModelingFL22-sketch-19122022-091234am.pptx
10-System-ModelingFL22-sketch-19122022-091234am.pptx10-System-ModelingFL22-sketch-19122022-091234am.pptx
10-System-ModelingFL22-sketch-19122022-091234am.pptx
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
Architectural Styles and Case Studies, Software architecture ,unit–2
Architectural Styles and Case Studies, Software architecture ,unit–2Architectural Styles and Case Studies, Software architecture ,unit–2
Architectural Styles and Case Studies, Software architecture ,unit–2
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
 
Text Mining
Text MiningText Mining
Text Mining
 
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr..."PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
"PageRank" - "The Anatomy of a Large-Scale Hypertextual Web Search Engine” pr...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
David buksbaum a-briefintroductiontocsharp
David buksbaum a-briefintroductiontocsharpDavid buksbaum a-briefintroductiontocsharp
David buksbaum a-briefintroductiontocsharp
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Avro intro
Avro introAvro intro
Avro intro
 

Mais de DefconRussia

[Defcon Russia #29] Борис Савков - Bare-metal programming на примере Raspber...
[Defcon Russia #29] Борис Савков -  Bare-metal programming на примере Raspber...[Defcon Russia #29] Борис Савков -  Bare-metal programming на примере Raspber...
[Defcon Russia #29] Борис Савков - Bare-metal programming на примере Raspber...DefconRussia
 
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...DefconRussia
 
[Defcon Russia #29] Алексей Тюрин - Spring autobinding
[Defcon Russia #29] Алексей Тюрин - Spring autobinding[Defcon Russia #29] Алексей Тюрин - Spring autobinding
[Defcon Russia #29] Алексей Тюрин - Spring autobindingDefconRussia
 
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/LinuxDefconRussia
 
Георгий Зайцев - Reversing golang
Георгий Зайцев - Reversing golangГеоргий Зайцев - Reversing golang
Георгий Зайцев - Reversing golangDefconRussia
 
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC [DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC DefconRussia
 
Cisco IOS shellcode: All-in-one
Cisco IOS shellcode: All-in-oneCisco IOS shellcode: All-in-one
Cisco IOS shellcode: All-in-oneDefconRussia
 
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...DefconRussia
 
HTTP HOST header attacks
HTTP HOST header attacksHTTP HOST header attacks
HTTP HOST header attacksDefconRussia
 
Attacks on tacacs - Алексей Тюрин
Attacks on tacacs - Алексей ТюринAttacks on tacacs - Алексей Тюрин
Attacks on tacacs - Алексей ТюринDefconRussia
 
Weakpass - defcon russia 23
Weakpass - defcon russia 23Weakpass - defcon russia 23
Weakpass - defcon russia 23DefconRussia
 
nosymbols - defcon russia 20
nosymbols - defcon russia 20nosymbols - defcon russia 20
nosymbols - defcon russia 20DefconRussia
 
static - defcon russia 20
static  - defcon russia 20static  - defcon russia 20
static - defcon russia 20DefconRussia
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20DefconRussia
 
Vm ware fuzzing - defcon russia 20
Vm ware fuzzing  - defcon russia 20Vm ware fuzzing  - defcon russia 20
Vm ware fuzzing - defcon russia 20DefconRussia
 
Nedospasov defcon russia 23
Nedospasov defcon russia 23Nedospasov defcon russia 23
Nedospasov defcon russia 23DefconRussia
 
Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23DefconRussia
 
Miasm defcon russia 23
Miasm defcon russia 23Miasm defcon russia 23
Miasm defcon russia 23DefconRussia
 
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...DefconRussia
 
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условиях
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условияхSergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условиях
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условияхDefconRussia
 

Mais de DefconRussia (20)

[Defcon Russia #29] Борис Савков - Bare-metal programming на примере Raspber...
[Defcon Russia #29] Борис Савков -  Bare-metal programming на примере Raspber...[Defcon Russia #29] Борис Савков -  Bare-metal programming на примере Raspber...
[Defcon Russia #29] Борис Савков - Bare-metal programming на примере Raspber...
 
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...
[Defcon Russia #29] Александр Ермолов - Safeguarding rootkits: Intel Boot Gua...
 
[Defcon Russia #29] Алексей Тюрин - Spring autobinding
[Defcon Russia #29] Алексей Тюрин - Spring autobinding[Defcon Russia #29] Алексей Тюрин - Spring autobinding
[Defcon Russia #29] Алексей Тюрин - Spring autobinding
 
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux
[Defcon Russia #29] Михаил Клементьев - Обнаружение руткитов в GNU/Linux
 
Георгий Зайцев - Reversing golang
Георгий Зайцев - Reversing golangГеоргий Зайцев - Reversing golang
Георгий Зайцев - Reversing golang
 
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC [DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
[DCG 25] Александр Большев - Never Trust Your Inputs or How To Fool an ADC
 
Cisco IOS shellcode: All-in-one
Cisco IOS shellcode: All-in-oneCisco IOS shellcode: All-in-one
Cisco IOS shellcode: All-in-one
 
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...
Олег Купреев - Обзор и демонстрация нюансов и трюков из области беспроводных ...
 
HTTP HOST header attacks
HTTP HOST header attacksHTTP HOST header attacks
HTTP HOST header attacks
 
Attacks on tacacs - Алексей Тюрин
Attacks on tacacs - Алексей ТюринAttacks on tacacs - Алексей Тюрин
Attacks on tacacs - Алексей Тюрин
 
Weakpass - defcon russia 23
Weakpass - defcon russia 23Weakpass - defcon russia 23
Weakpass - defcon russia 23
 
nosymbols - defcon russia 20
nosymbols - defcon russia 20nosymbols - defcon russia 20
nosymbols - defcon russia 20
 
static - defcon russia 20
static  - defcon russia 20static  - defcon russia 20
static - defcon russia 20
 
Zn task - defcon russia 20
Zn task  - defcon russia 20Zn task  - defcon russia 20
Zn task - defcon russia 20
 
Vm ware fuzzing - defcon russia 20
Vm ware fuzzing  - defcon russia 20Vm ware fuzzing  - defcon russia 20
Vm ware fuzzing - defcon russia 20
 
Nedospasov defcon russia 23
Nedospasov defcon russia 23Nedospasov defcon russia 23
Nedospasov defcon russia 23
 
Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23Advanced cfg bypass on adobe flash player 18 defcon russia 23
Advanced cfg bypass on adobe flash player 18 defcon russia 23
 
Miasm defcon russia 23
Miasm defcon russia 23Miasm defcon russia 23
Miasm defcon russia 23
 
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...
Andrey Belenko, Alexey Troshichev - Внутреннее устройство и безопасность iClo...
 
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условиях
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условияхSergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условиях
Sergey Belov - Покажите нам Impact! Доказываем угрозу в сложных условиях
 

Último

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 

Último (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 

Anton Dorfman - Reversing data formats what data can reveal

  • 1. Reversing Data Formats: What Data Can Reveal by Anton Dorfman ZeroNights 0x03, Moscow, 2013
  • 2. About me • • • • • Fan of & Fun with Assembly language Researcher Scientist Teach Reverse Engineering since 2001 Candidate of technical science
  • 6. What is it? Hint 1
  • 7. What is it? Hint 2
  • 10. What is it? Hint 1
  • 11. What is it? Hint 2
  • 13. Standard way • • • • Hex editor Researcher Brain (equally important as the Hex editor) Basic knowledge how data can be organized (in brain) • Analysis of the executable file that manipulates with data format
  • 14. 3 ways of researching • Dynamic analysis – use reverse engineering of application that manipulate with data format • Static analysis – try to extract header, structures, fields and try to find relationship between data • Statistic analysis – when have some amount of samples and use statistics of changes and ranges of values
  • 15. Breaking the rules • There is the rule RTFM (Read The F**king Manual) • Nobody likes it • I’m not exception • First of all I start my research, and second – try to find related works and generalize ideas from them
  • 16. Definitions • Field – some value used to describe Data Format • Structure – way for organizing various fields • Data – information for representing which Data Format is developed • Header – common structure before data, may contain substructures
  • 18. Main Idea • Data format is developed by human (not pets or aliens) • Sometimes looks like that no human works on it… • Format developers use common data organization concepts and similar thoughts when creating new data formats • If we find regularities in data format organization rules we can automate searching of them
  • 19. Three types of data format • File formats – multimedia formats, database formats, internal formats for exchanging between program components and etc. • Protocols – network protocols, hardware device interaction protocols, protocols of interaction between driver and user space application and etc. • Structures in memory – OS structures, application structures and etc.
  • 20. Levels of abstraction • • • • • • • Bit Order Byte Order Fields Size Field Basic Type Field Type Structure Field Semantics
  • 21. Tasks to solve • • • • • • • Extract header, separate it from data Find field boundaries Find structures and substructures Find types of fields Detect bit and byte ordering Determine semantics of fields Interpret goal of field – it’s semantics
  • 22. Bit order • There are most significant bit – MSB and least significant bit - LSB • MSB is a left-most bit – commonly used – value is 10010101b – 95h - 149 • LSB is a left-most bit - value is 10101001b – A9h 169
  • 23. Byte order • Little-endian – Intel byte order • Big-endian – network byte order • Middle-endian or mixed-endian – mixed byte order – when length of value more then default processor machine word
  • 25. Types of fields • Service fields – for describing Data Format (size of structure and etc.) • Common fields – “fields from life” (time, date and etc.) • Specific fields – we can find range for that type (bit flags and etc.) • Application specific – can be interpret only by application, we should use dynamic analysis
  • 26. Field Size and Levels of field interpretation • Commonly field is a byte sequence • Sometimes field is a bit sequence • • • • Fixed size in bytes (1, 2, 3, 4, …) Fixed size in bits (1, 2, 3, 4, …) Variable size in bytes Variable size in bits
  • 27. Basic field types • • • • • • Hex value – by default Decimal value Character value (up to 4 symbols) String (ASCII or Unicode) Float value Bits value
  • 29. Text • Usually fixed size, remaining space after text filled with 0 • Variable sized text ended with 0, or there are size of this text field • Usually text string contain their meanings
  • 30. Offset • Offset to another field • Offset to data • Pointer to another structure (may be child or parent) • Pointer to another instance of such structure (next or previous in linked list) • Can be relative and absolute
  • 31. Export in PE-EXE Module ImageBase IMAGE_NT_HEADERS +ImageBase Signature DWORD IMAGE_DOS_HEADER IMAGE_EXPORT_DIRECTORY +ImageBase ... ... FileHeader IMAGE_FILE_HEADER +01Ch AddressOfFunctions e_lfanew DWORD +03Ch OptionalHeader IMAGE_OPTIONAL_HEADER +020h AddressOfNames IMAGE_DIRECTORY_ENTRY _EXPORT +024h AddressOfNameOrdinals ... +078h ... Names (Offsets) ... (Elem size - dword) Offset FuncName X Offset FuncName X+1 Offset FuncName X+2 ... +ImageBase +ImageBase Name Ordinals Index = Index Function Names +ImageBase ... ‘LoadLibraryA’,0 ‘LoadLibraryExA’,0 ‘LoadLibraryExW’,0 ... ... ... (Elem size - word) Index FuncAddr X Index FuncAddr X+1 Index FuncAddr X+2 ... Find LoadLibraryExA +ImageBase Functions Addresses ... (Elem size - dword) FuncAddr X FuncAddr X+1 FuncAddr X+2 ... LoadLibraryExA EP: mov edi,edi push ebp ... +ImageBase
  • 32. Size • • • • Size of data Size of structure Size of substructure Size of field • Can be not only in bytes, but also in bits, words (2 byte), double words, paragraphs (16 byte) and etc.
  • 33. Quantity • Quantity of substructures • Quantity of elements • IMAGE_NT_HEADERS.IMAGE_FILE_HEADER. NumberOfSections • IMAGE_EXPORT_DIRECTORY. NumberOfFunctions
  • 34. ID • Identifier of data format, often called magic number • ID of structure • ID of substructure • Often ID is value consist of char symbols • Often ID looks like “magic” - BE BA FE CA
  • 35. Flag • Bit Flags • Enum values as a flags • Special case – Bool value
  • 36. Counter • Increment (possibly decrement) • Starts with 0 / begins with another value • Changes by 1 or another value
  • 38. Protect • CRC value of data • CRC value of substructure • Various Hash functions and etc.
  • 40. Projects, articles and authors • Protocol Informatics Project – based on “Network Protocol Analysis using Bioinformatics Algorithms” by Marshall A. Beddoe • Discoverer - “Discoverer: Automatic Protocol Reverse Engineering from Network Traces” by Weidong Cui, Jayanthkumar Kannan, Helen J. Wang • Laika – “Digging For Data Structures” by Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T. King • RolePlayer – “Protocol-Independent Adaptive Replay of Application Dialog” by Weidong Cuiy, Vern Paxsonz, Nicholas C. Weaverz, Randy H. Katzy • Tupni – “Tupni: Automatic Reverse Engineering of Input Formats” by Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, Luiz Irun-Briz
  • 41. Protocol Informatics Project • Global and local sequence alignment Needleman Wunsch and Smith Waterman algorithms
  • 43. Laika • Using Bayesian unsupervised learning • For fixed size structures only
  • 44. Tupni • Based on taint tracking engine
  • 45. Applications • • • • • • • • RE any program RE undocumented/proprietary file formats RE undocumented/proprietary network protocols RE undocumented structures in memory Fuzzing Examination of protocol implementation Replay network interaction Zero-day vulnerability signature generation
  • 46. Thank you! • Questions? • Meet me at researcher@inbox.ru, • dorfmananton@gmail.com