TunSafe open source (Same as 1.3-rc3 version)

This commit is contained in:
Ludvig Strigeus 2018-08-08 13:12:38 +02:00
commit 64bb3cd6b3
198 changed files with 92490 additions and 0 deletions

18
.gitignore vendored Normal file
View file

@ -0,0 +1,18 @@
/Debug/
/Release/
/ipzip2/Debug/
/Build
/Win32/
/TunSafe.aps
/ipch
/*.sdf
/*vcxproj.user
/*.opensdf
/*.suo
/.vs/
/x64/
/Azire.conf
/*.psess
/*.vspx
/installer/*.zip
/config/

76
LICENSE.AGPL.TXT Normal file
View file

@ -0,0 +1,76 @@
AFFERO GENERAL PUBLIC LICENSE
Version 1, March 2002
Copyright © 2002 Affero Inc.
510 Third Street - Suite 225, San Francisco, CA 94107, USA
This license is a modified version of the GNU General Public License copyright (C) 1989, 1991 Free Software Foundation, Inc. made with their permission. Section 2(d) has been added to cover use of software over a computer network.
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the Affero General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This Public License applies to most of Affero's software and to any other program whose authors commit to using it. (Some other Affero software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. This General Public License is designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this Affero General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.
c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)
d) If the Program as you received it is intended to interact with users through a computer network and if, in the version you received, any user interacting with the Program was given the opportunity to request transmission to that user of the Program's complete source code, you must not remove that facility from your modified version of the Program or work based on the Program, and must offer an equivalent opportunity for all users interacting with your Program through a computer network to request immediate transmission by HTTP of the complete source code of your modified version or other derivative work.
These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
9. Affero Inc. may publish revised and/or new versions of the Affero General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by Affero, Inc. If the Program does not specify a version number of this License, you may choose any version ever published by Affero, Inc.
You may also choose to redistribute modified versions of this program under any version of the Free Software Foundation's GNU General Public License version 3 or higher, so long as that version of the GNU GPL includes terms and conditions substantially equivalent to those of this license.
10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by Affero, Inc., write to us; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

11
README.md Normal file
View file

@ -0,0 +1,11 @@
# TunSafe
Source code of the TunSafe client.
This open sourced TunSafe code is AGPL-1.0 licensed. Do note that the repository contains BSD and OpenSSL licensed files, so if you want to release a version based off of this repository you need to take that into account.
To build on Windows, open TunSafe.sln and build, or run build.py.
To build on Linux, run build_linux.sh
To build on FreeBSD, run build_freebsd.sh

16
TunSafe.conf Normal file
View file

@ -0,0 +1,16 @@
[Interface]
PrivateKey = KMakx+0sYjWKnkY2pO8+CFZ0Sp+Gzzp/GfxwlR+WgXQ=
ListenPort = 51820
Address = 192.168.2.2/24
MTU = 1420
[Peer]
PublicKey = 2m1BdGW9AwwF5dqaGm0NgMggdDZDUPFAL4JxCySdgBw=
#AllowedIPs = 0.0.0.0/0, fc00::2/64
AllowedIPs = 192.168.2.0/24
Endpoint = 192.168.1.4:8040
#Endpoint = [fe80::6825:68f4:7c6f:42d4]:8040
PersistentKeepalive = 25

BIN
TunSafe.rc Normal file

Binary file not shown.

46
TunSafe.sln Normal file
View file

@ -0,0 +1,46 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.26403.7
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "TunSafe", "TunSafe.vcxproj", "{626FBC16-64C6-407D-BC2B-6C087794E0D0}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Win32 = Debug|Win32
Debug|x64 = Debug|x64
Release|Win32 = Release|Win32
Release|x64 = Release|x64
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Debug|Win32.ActiveCfg = Debug|Win32
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Debug|Win32.Build.0 = Debug|Win32
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Debug|x64.ActiveCfg = Debug|x64
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Debug|x64.Build.0 = Debug|x64
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Release|Win32.ActiveCfg = Release|Win32
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Release|Win32.Build.0 = Release|Win32
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Release|x64.ActiveCfg = Release|x64
{626FBC16-64C6-407D-BC2B-6C087794E0D0}.Release|x64.Build.0 = Release|x64
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
GlobalSection(Performance) = preSolution
HasPerformanceSessions = true
EndGlobalSection
EndGlobal

268
TunSafe.vcxproj Normal file
View file

@ -0,0 +1,268 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{626FBC16-64C6-407D-BC2B-6C087794E0D0}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>TunSafe</RootNamespace>
<WindowsTargetPlatformVersion>10.0.15063.0</WindowsTargetPlatformVersion>
<ProjectName>TunSafe</ProjectName>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
<Import Project="crypto\nasm.props" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
<TargetName>TunSafe</TargetName>
<OutDir>$(SolutionDir)$(Platform)\$(Configuration)\</OutDir>
<IntDir>$(Platform)\$(Configuration)\</IntDir>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(VC_ExecutablePath_x64);$(WindowsSDK_ExecutablePath);$(VS_ExecutablePath);$(MSBuild_ExecutablePath);$(FxCopDir);$(PATH);C:\Bin\Dev\nasm</ExecutablePath>
<TargetName>TunSafe</TargetName>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
<TargetName>TunSafe</TargetName>
<OutDir>$(SolutionDir)$(Platform)\$(Configuration)\</OutDir>
<IntDir>$(Platform)\$(Configuration)\</IntDir>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(VC_ExecutablePath_x64);$(WindowsSDK_ExecutablePath);$(VS_ExecutablePath);$(MSBuild_ExecutablePath);$(FxCopDir);$(PATH);C:\Bin\Dev\nasm</ExecutablePath>
<TargetName>TunSafe</TargetName>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>Use</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions);_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS</PreprocessorDefinitions>
<AdditionalIncludeDirectories>.</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies);ws2_32.lib;Iphlpapi.lib</AdditionalDependencies>
<UACExecutionLevel>RequireAdministrator</UACExecutionLevel>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>Use</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions);_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_WARNINGS=1</PreprocessorDefinitions>
<ForcedIncludeFiles>
</ForcedIncludeFiles>
<AdditionalIncludeDirectories>.</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies);ws2_32.lib;Iphlpapi.lib;Comctl32.lib</AdditionalDependencies>
<AdditionalManifestDependencies>
</AdditionalManifestDependencies>
<UACExecutionLevel>RequireAdministrator</UACExecutionLevel>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>Use</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions);_CRT_SECURE_NO_WARNINGS</PreprocessorDefinitions>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<AdditionalIncludeDirectories>.</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies);ws2_32.lib;Iphlpapi.lib</AdditionalDependencies>
<UACExecutionLevel>RequireAdministrator</UACExecutionLevel>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>Use</PrecompiledHeader>
<Optimization>MinSpace</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions);_CRT_SECURE_NO_WARNINGS=1</PreprocessorDefinitions>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<FavorSizeOrSpeed>Size</FavorSizeOrSpeed>
<ForcedIncludeFiles>
</ForcedIncludeFiles>
<InlineFunctionExpansion>AnySuitable</InlineFunctionExpansion>
<OmitFramePointers>true</OmitFramePointers>
<AdditionalIncludeDirectories>.</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Windows</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<AdditionalDependencies>kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies);ws2_32.lib;Iphlpapi.lib</AdditionalDependencies>
<UACExecutionLevel>RequireAdministrator</UACExecutionLevel>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<ClInclude Include="bit_ops.h" />
<ClInclude Include="tunsafe_config.h" />
<ClInclude Include="tunsafe_cpu.h" />
<ClInclude Include="crypto\aesgcm\aes.h" />
<ClInclude Include="crypto\blake2s.h" />
<ClInclude Include="crypto\chacha20poly1305.h" />
<ClInclude Include="crypto\siphash.h" />
<ClInclude Include="tunsafe_endian.h" />
<ClInclude Include="netapi.h" />
<ClInclude Include="network_win32_api.h" />
<ClInclude Include="network_win32_dnsblock.h" />
<ClInclude Include="resource.h" />
<ClInclude Include="stdafx.h" />
<ClInclude Include="tunsafe_types.h" />
<ClInclude Include="wireguard_config.h" />
<ClInclude Include="util.h" />
<ClInclude Include="network_win32.h" />
<ClInclude Include="wireguard.h" />
<ClInclude Include="wireguard_proto.h" />
</ItemGroup>
<ItemGroup>
<ClCompile Include="benchmark.cpp" />
<ClCompile Include="tunsafe_cpu.cpp" />
<ClCompile Include="crypto\aesgcm\aesgcm.cpp" />
<ClCompile Include="crypto\blake2s_sse.cpp" />
<ClCompile Include="crypto\siphash.cpp" />
<ClCompile Include="network_win32_dnsblock.cpp" />
<ClCompile Include="util.cpp" />
<ClCompile Include="network_win32.cpp" />
<ClCompile Include="wireguard.cpp" />
<ClCompile Include="crypto\blake2s.cpp">
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">NotUsing</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">NotUsing</PrecompiledHeader>
</ClCompile>
<ClCompile Include="crypto\chacha20poly1305.cpp">
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">NotUsing</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">NotUsing</PrecompiledHeader>
</ClCompile>
<ClCompile Include="crypto\curve25519-donna.cpp">
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">NotUsing</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">NotUsing</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">NotUsing</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">NotUsing</PrecompiledHeader>
</ClCompile>
<ClCompile Include="stdafx.cpp">
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">Create</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">Create</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">Create</PrecompiledHeader>
<PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release|x64'">Create</PrecompiledHeader>
</ClCompile>
<ClCompile Include="wireguard_config.cpp" />
<ClCompile Include="tunsafe_win32.cpp" />
<ClCompile Include="wireguard_proto.cpp" />
</ItemGroup>
<ItemGroup>
<ResourceCompile Include="TunSafe.rc" />
</ItemGroup>
<ItemGroup>
<Image Include="icons\green-bg-icon.ico" />
<Image Include="icons\green-icon.ico" />
<Image Include="icons\neutral-icon.ico" />
<Image Include="icons\red-icon.ico" />
</ItemGroup>
<ItemGroup>
<NASM Include="crypto\aesgcm\aesni_gcm_x64_nasm.asm">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
<NASM Include="crypto\aesgcm\aesni_x64_nasm.asm">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
<NASM Include="crypto\aesgcm\ghash_x64_nasm.asm">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
<NASM Include="crypto\chacha20_x64.asm">
<FileType>Document</FileType>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
<NASM Include="crypto\curve25519_x64_nasm.asm">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
<NASM Include="crypto\poly1305_x64_nasm.asm">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
</NASM>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
<Import Project="crypto\nasm.targets" />
</ImportGroup>
</Project>

154
TunSafe.vcxproj.filters Normal file
View file

@ -0,0 +1,154 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="Source Files">
<UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
<Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
</Filter>
<Filter Include="crypto">
<UniqueIdentifier>{cfa17b4c-1bee-434e-81b4-ba780c3f7e2d}</UniqueIdentifier>
</Filter>
<Filter Include="Source Files\Win32">
<UniqueIdentifier>{49ba9478-f871-449f-a410-b401e993893f}</UniqueIdentifier>
</Filter>
<Filter Include="crypto\aesgcm">
<UniqueIdentifier>{d31b1b9f-4a2e-42d4-a26c-7c3daa4ccbe3}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
<ClInclude Include="stdafx.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tunsafe_endian.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="resource.h" />
<ClInclude Include="wireguard.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="wireguard_proto.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="util.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="network_win32_dnsblock.h">
<Filter>Source Files\Win32</Filter>
</ClInclude>
<ClInclude Include="network_win32.h">
<Filter>Source Files\Win32</Filter>
</ClInclude>
<ClInclude Include="network_win32_api.h">
<Filter>Source Files\Win32</Filter>
</ClInclude>
<ClInclude Include="crypto\chacha20poly1305.h">
<Filter>crypto</Filter>
</ClInclude>
<ClInclude Include="crypto\blake2s.h">
<Filter>crypto</Filter>
</ClInclude>
<ClInclude Include="wireguard_config.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="netapi.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="crypto\siphash.h">
<Filter>crypto</Filter>
</ClInclude>
<ClInclude Include="tunsafe_types.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="crypto\aesgcm\aes.h">
<Filter>crypto\aesgcm</Filter>
</ClInclude>
<ClInclude Include="tunsafe_cpu.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="bit_ops.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tunsafe_config.h">
<Filter>Source Files</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="stdafx.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="wireguard.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="wireguard_proto.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="util.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="network_win32_dnsblock.cpp">
<Filter>Source Files\Win32</Filter>
</ClCompile>
<ClCompile Include="tunsafe_win32.cpp">
<Filter>Source Files\Win32</Filter>
</ClCompile>
<ClCompile Include="network_win32.cpp">
<Filter>Source Files\Win32</Filter>
</ClCompile>
<ClCompile Include="crypto\blake2s.cpp">
<Filter>crypto</Filter>
</ClCompile>
<ClCompile Include="crypto\blake2s_sse.cpp">
<Filter>crypto</Filter>
</ClCompile>
<ClCompile Include="crypto\chacha20poly1305.cpp">
<Filter>crypto</Filter>
</ClCompile>
<ClCompile Include="crypto\curve25519-donna.cpp">
<Filter>crypto</Filter>
</ClCompile>
<ClCompile Include="wireguard_config.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="crypto\siphash.cpp">
<Filter>crypto</Filter>
</ClCompile>
<ClCompile Include="crypto\aesgcm\aesgcm.cpp">
<Filter>crypto\aesgcm</Filter>
</ClCompile>
<ClCompile Include="benchmark.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tunsafe_cpu.cpp">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<ResourceCompile Include="TunSafe.rc" />
</ItemGroup>
<ItemGroup>
<Image Include="icons\neutral-icon.ico" />
<Image Include="icons\green-icon.ico" />
<Image Include="icons\red-icon.ico" />
<Image Include="icons\green-bg-icon.ico" />
</ItemGroup>
<ItemGroup>
<NASM Include="crypto\chacha20_x64.asm">
<Filter>crypto</Filter>
</NASM>
<NASM Include="crypto\curve25519_x64_nasm.asm">
<Filter>crypto</Filter>
</NASM>
<NASM Include="crypto\poly1305_x64_nasm.asm">
<Filter>crypto</Filter>
</NASM>
<NASM Include="crypto\aesgcm\aesni_gcm_x64_nasm.asm">
<Filter>crypto\aesgcm</Filter>
</NASM>
<NASM Include="crypto\aesgcm\aesni_x64_nasm.asm">
<Filter>crypto\aesgcm</Filter>
</NASM>
<NASM Include="crypto\aesgcm\ghash_x64_nasm.asm">
<Filter>crypto\aesgcm</Filter>
</NASM>
</ItemGroup>
</Project>

94
benchmark.cpp Normal file
View file

@ -0,0 +1,94 @@
// SPDX-License-Identifier: AGPL-1.0-only
// Copyright (C) 2018 Ludvig Strigeus <info@tunsafe.com>. All Rights Reserved.
#include "stdafx.h"
#include "tunsafe_types.h"
#include "crypto/chacha20poly1305.h"
#include "crypto/aesgcm/aes.h"
#include "tunsafe_cpu.h"
#include <functional>
#include <string.h>
#if defined(OS_FREEBSD) || defined(OS_LINUX)
#include <time.h>
#include <stdlib.h>
typedef uint64 LARGE_INTEGER;
void QueryPerformanceCounter(LARGE_INTEGER *x) {
struct timespec ts;
if (clock_gettime(CLOCK_MONOTONIC, &ts) != 0) {
fprintf(stderr, "clock_gettime failed\n");
exit(1);
}
*x = (uint64)ts.tv_sec * 1000000000 + ts.tv_nsec;
}
void QueryPerformanceFrequency(LARGE_INTEGER *x) {
*x = 1000000000;
}
#elif defined(OS_MACOSX)
#include <mach/mach.h>
#include <mach/mach_time.h>
typedef uint64 LARGE_INTEGER;
void QueryPerformanceCounter(LARGE_INTEGER *x) {
*x = mach_absolute_time();
}
void QueryPerformanceFrequency(LARGE_INTEGER *x) {
mach_timebase_info_data_t timebase = { 0, 0 };
if (mach_timebase_info(&timebase) != 0)
abort();
printf("numer/denom: %d %d\n", timebase.numer, timebase.denom);
*x = timebase.denom * 1000000000;
}
#endif
int gcm_self_test();
void *fake_glb;
void Benchmark() {
int64 a, b, f, t1 = 0, t2 = 0;
#if WITH_AESGCM
gcm_self_test();
#endif // WITH_AESGCM
PrintCpuFeatures();
QueryPerformanceFrequency((LARGE_INTEGER*)&f);
uint8 dst[1500 + 16];
uint8 key[32] = {0, 1, 2, 3, 4, 5, 6};
uint8 mac[16];
fake_glb = dst;
auto RunOneBenchmark = [&](const char *name, const std::function<uint64(size_t)> &ff) {
uint64 bytes = 0;
QueryPerformanceCounter((LARGE_INTEGER*)&b);
size_t i;
for (i = 0; bytes < 1000000000; i++)
bytes += ff(i);
QueryPerformanceCounter((LARGE_INTEGER*)&a);
RINFO("%s: %f MB/s", name, (double)bytes * 0.000001 / (a - b) * f);
};
memset(dst, 0, 1500);
RunOneBenchmark("chacha20-encrypt", [&](size_t i) -> uint64 { chacha20poly1305_encrypt(dst, dst, 1460, NULL, 0, i, key); return 1460; });
RunOneBenchmark("chacha20-decrypt", [&](size_t i) -> uint64 { chacha20poly1305_decrypt_get_mac(dst, dst, 1460, NULL, 0, i, key, mac); return 1460; });
RunOneBenchmark("poly1305-only", [&](size_t i) -> uint64 { poly1305_get_mac(dst, 1460, NULL, 0, i, key, mac); return 1460; });
#if WITH_AESGCM
if (X86_PCAP_AES) {
AesGcm128StaticContext sctx;
CRYPTO_gcm128_init(&sctx, key, 128);
RunOneBenchmark("aes128-gcm-encrypt", [&](size_t i) -> uint64 { aesgcm_encrypt(dst, dst, 1460, NULL, 0, i, &sctx); return 1460; });
RunOneBenchmark("aes128-gcm-decrypt", [&](size_t i) -> uint64 { aesgcm_decrypt_get_mac(dst, dst, 1460, NULL, 0, i, &sctx, mac); return 1460; });
}
#endif // WITH_AESGCM
}

49
bit_ops.h Normal file
View file

@ -0,0 +1,49 @@
// SPDX-License-Identifier: AGPL-1.0-only
// Copyright (C) 2018 Ludvig Strigeus <info@tunsafe.com>. All Rights Reserved.
#pragma once
#include "tunsafe_types.h"
#include "tunsafe_endian.h"
#if !defined(ARCH_CPU_X86_64) && defined(COMPILER_MSVC)
static inline int _BitScanReverse64(unsigned long *index, uint64 x) {
if (_BitScanReverse(index, x >> 32)) {
(*index) += 32;
return true;
}
return _BitScanReverse(index, (uint32)x);
}
#endif
#if !defined(COMPILER_MSVC)
static inline int _BitScanReverse64(unsigned long *index, uint64 x) {
*index = 63 - __builtin_clzll(x);
return (x != 0);
}
static inline int _BitScanReverse(unsigned long *index, uint32 x) {
*index = 31 - __builtin_clz(x);
return (x != 0);
}
#endif
static inline int FindHighestSetBit32(uint32 x) {
unsigned long index;
return _BitScanReverse(&index, x) ? (int)(index + 1) : 0;
}
static inline int FindLastSetBit32(uint32 x) {
unsigned long index;
_BitScanReverse(&index, x);
return index;
}
static inline int FindHighestSetBit64(uint64 x) {
unsigned long index;
return _BitScanReverse64(&index, x) ? (int)(index + 1) : 0;
}
static inline int FindHighestSetBit128(uint64 hi, uint64 lo) {
return hi ? 64 + FindHighestSetBit64(hi) : FindHighestSetBit64(lo);
}

95
build.py Normal file
View file

@ -0,0 +1,95 @@
# SPDX-License-Identifier: AGPL-1.0-only
# Copyright (C) 2018 Ludvig Strigeus <info@tunsafe.com>. All Rights Reserved.
import os
import shutil
import win32crypt
import base64
import sys
import zipfile
import re
MSBUILD_PATH = r"C:\Dev\VS2017\MSBuild\15.0\Bin\MSBuild.exe"
NSIS_PATH = r'C:\Dev\NSIS\makeNSIS.EXE'
SIGNTOOL_PATH = r'c:\Program Files (x86)\Windows Kits\10\bin\10.0.15063.0\x86\signtool.exe'
SIGNTOOL_KEY_PATH = '' # put key here
SIGNTOOL_PASS = '' # put key pass here
def RmTree(path):
try:
print ('Deleting %s' % path)
shutil.rmtree(path)
except FileNotFoundError:
pass
def Run(s):
print ('Running %s' % s)
x = os.system(s)
if x:
raise Exception('Command failed (%d) : %s' % (x, s))
def CopyFile(src, dst):
shutil.copyfile(src, dst)
def SignExe(src):
print ('Signing %s' % src)
cmd = r'""c:\Program Files (x86)\Windows Kits\10\bin\10.0.15063.0\x86\signtool.exe" sign /f "%s" /p %s /t http://timestamp.verisign.com/scripts/timstamp.dll "%s"' % (SIGNTOOL_KEY_PATH, SIGNTOOL_PASS, src)
#cmd = r'""c:\Program Files (x86)\Windows Kits\10\bin\10.0.15063.0\x86\signtool.exe" sign %s ' % (SIGNTOOL_KEY_PATH, )
x = os.system(cmd)
if x:
raise Exception('Signing failed (%d) : %s' % (x, cmd))
def GetVersion():
for line in open(BASE + '/tunsafe_config.h', 'r'):
m = re.match('^#define TUNSAFE_VERSION_STRING "TunSafe (.*)"$', line)
if m:
return m.group(1)
raise Exception('Version not found')
#
#os.system(r'""')
command = sys.argv[1]
BASE = r'D:\Code\TunSafe'
if command == 'build_tap':
Run(r'%s /V4 installer\tap\tap-windows6.nsi' % NSIS_PATH)
SignExe(r'installer\tap\TunSafe-TAP-9.21.2.exe')
sys.exit(0)
if 1:
RmTree(BASE + r'\Win32\Release')
RmTree(BASE + r'\x64\Release')
Run('%s TunSafe.sln /t:Clean;Rebuild /p:Configuration=Release /p:Platform=x64' % MSBUILD_PATH)
Run('%s TunSafe.sln /t:Clean;Rebuild /p:Configuration=Release /p:Platform=Win32' % MSBUILD_PATH)
if 1:
CopyFile(BASE + r'\Win32\Release\TunSafe.exe',
BASE + r'\installer\x86\TunSafe.exe')
SignExe(BASE + r'\installer\x86\TunSafe.exe')
CopyFile(BASE + r'\x64\Release\TunSafe.exe',
BASE + r'\installer\x64\TunSafe.exe')
SignExe(BASE + r'\installer\x64\TunSafe.exe')
VERSION = GetVersion()
Run(r'%s /V4 -DPRODUCT_VERSION=%s installer\tunsafe.nsi ' % (NSIS_PATH, VERSION))
SignExe(BASE + r'\installer\TunSafe-%s.exe' % VERSION)
zipf = zipfile.ZipFile(BASE + '\installer\TunSafe-%s-x86.zip' % VERSION, 'w', zipfile.ZIP_DEFLATED)
zipf.write(BASE + r'\installer\x86\TunSafe.exe', 'TunSafe.exe')
zipf.write(BASE + r'\installer\License.txt', 'License.txt')
zipf.write(BASE + r'\installer\ChangeLog.txt', 'ChangeLog.txt')
zipf.write(BASE + r'\installer\TunSafe.conf', 'Config\\TunSafe.conf')
zipf.close()
zipf = zipfile.ZipFile(BASE + '\installer\TunSafe-%s-x64.zip' % VERSION, 'w', zipfile.ZIP_DEFLATED)
zipf.write(BASE + r'\installer\x64\TunSafe.exe', 'TunSafe.exe')
zipf.write(BASE + r'\installer\License.txt', 'License.txt')
zipf.write(BASE + r'\installer\ChangeLog.txt', 'ChangeLog.txt')
zipf.write(BASE + r'\installer\TunSafe.conf', 'Config\\TunSafe.conf')
zipf.close()

116
build_config.h Normal file
View file

@ -0,0 +1,116 @@
// File is taken from Chromium
#ifndef BUILD_BUILD_CONFIG_H_
#define BUILD_BUILD_CONFIG_H_
#if defined(__APPLE__)
#include <TargetConditionals.h>
#endif
// A set of macros to use for platform detection.
#if defined(__APPLE__)
#define OS_MACOSX 1
#if defined(TARGET_OS_IPHONE) && TARGET_OS_IPHONE
#define OS_IOS 1
#endif // defined(TARGET_OS_IPHONE) && TARGET_OS_IPHONE
#elif defined(ANDROID)
#define OS_ANDROID 1
#elif defined(__native_client__)
#define OS_NACL 1
#elif defined(__FLASHPLAYER)
#define OS_FLASHPLAYER 1
#elif defined(__linux__)
#define OS_LINUX 1
#elif defined(_WIN32)
#define OS_WIN 1
#elif defined(__FreeBSD__)
#define OS_FREEBSD 1
#elif defined(__OpenBSD__)
#define OS_OPENBSD 1
#elif defined(__sun)
#define OS_SOLARIS 1
#elif defined(EMSCRIPTEN)
#define OS_EMSCRIPTEN 1
#else
#error Please add support for your platform in build_config.h
#endif
// For access to standard BSD features, use OS_BSD instead of a
// more specific macro.
#if defined(OS_FREEBSD) || defined(OS_OPENBSD)
#define OS_BSD 1
#endif
// For access to standard POSIXish features, use OS_POSIX instead of a
// more specific macro.
#if defined(OS_MACOSX) || defined(OS_LINUX) || defined(OS_FREEBSD) || \
defined(OS_OPENBSD) || defined(OS_SOLARIS) || defined(OS_ANDROID) || \
defined(OS_NACL)
#define OS_POSIX 1
#endif
#if defined(OS_POSIX) && !defined(OS_MACOSX) && !defined(OS_ANDROID) && \
!defined(OS_NACL)
#define USE_X11 1 // Use X for graphics.
#endif
// Compiler detection.
#if defined(__GNUC__)
#define COMPILER_GCC 1
#if defined(__clang__)
#define COMPILER_CLANG 1
#endif
#elif defined(_MSC_VER)
#define COMPILER_MSVC 1
#elif defined(__TINYC__)
#define COMPILER_TCC 1
#else
#error Please add support for your compiler in build/build_config.h
#endif
// Processor architecture detection. For more info on what's defined, see:
// http://msdn.microsoft.com/en-us/library/b0084kay.aspx
// http://www.agner.org/optimize/calling_conventions.pdf
// or with gcc, run: "echo | gcc -E -dM -"
#if defined(_M_X64) || defined(__x86_64__)
#define ARCH_CPU_X86_FAMILY 1
#define ARCH_CPU_X86_64 1
#define ARCH_CPU_64_BITS 1
#define ARCH_CPU_LITTLE_ENDIAN 1
#define ARCH_CPU_ALLOW_UNALIGNED 1
#elif defined(_M_IX86) || defined(__i386__)
#define ARCH_CPU_X86_FAMILY 1
#define ARCH_CPU_X86 1
#define ARCH_CPU_32_BITS 1
#define ARCH_CPU_LITTLE_ENDIAN 1
#define ARCH_CPU_ALLOW_UNALIGNED 1
#define ARCH_CPU_NEED_64BIT_ALIGN 1
#elif defined(__ARMEL__) || defined(__arm__) && defined(__ARMCC_VERSION)
#define ARCH_CPU_ARM_FAMILY 1
#define ARCH_CPU_ARMEL 1
#define ARCH_CPU_32_BITS 1
#define ARCH_CPU_LITTLE_ENDIAN 1
#elif defined(__pnacl__)
#define ARCH_CPU_32_BITS 1
#elif defined(__MIPSEL__)
#define ARCH_CPU_MIPS_FAMILY 1
#define ARCH_CPU_MIPSEL 1
#define ARCH_CPU_32_BITS 1
#define ARCH_CPU_LITTLE_ENDIAN 1
#elif defined(EMSCRIPTEN)
#define ARCH_CPU_JS 1
#define ARCH_CPU_32_BITS 1
#define ARCH_CPU_LITTLE_ENDIAN 1
#elif defined(__FLASHPLAYER)
#define ARCH_CPU_FLASHPLAYER 1
#define ARCH_CPU_32_BITS 1
#else
#error Please add support for your architecture in build_config.h
#endif
#if defined(ARCH_CPU_LITTLE_ENDIAN) && defined(ARCH_CPU_BIG_ENDIAN) || !defined(ARCH_CPU_LITTLE_ENDIAN) && !defined(ARCH_CPU_BIG_ENDIAN)
#error Please add support for your endianness in build_config.h
#endif
#endif // BUILD_BUILD_CONFIG_H_

2
build_freebsd.sh Normal file
View file

@ -0,0 +1,2 @@
g++7 -I . -O2 -static -mssse3 -o tunsafe benchmark.cpp tunsafe_cpu.cpp wireguard_config.cpp wireguard.cpp wireguard_proto.cpp util.cpp network_bsd.cpp crypto/blake2s.cpp crypto/blake2s_sse.cpp crypto/chacha20poly1305.cpp crypto/curve25519-donna.cpp crypto/siphash.cpp crypto/chacha20_x64_gas.s crypto/poly1305_x64_gas.s ipzip2/ipzip2.cpp -lrt

9
build_linux.sh Normal file
View file

@ -0,0 +1,9 @@
#!/bin/sh
clang++-6.0 -c -march=skylake-avx512 crypto/poly1305_x64_gas.s crypto/chacha20_x64_gas.s
clang++-6.0 -I . -O3 -mssse3 -pthread -lrt -o tunsafe util.cpp wireguard_config.cpp wireguard.cpp \
wireguard_proto.cpp network_bsd_mt.cpp tunsafe_cpu.cpp benchmark.cpp crypto/blake2s.cpp crypto/blake2s_sse.cpp crypto/chacha20poly1305.cpp \
crypto/curve25519-donna.cpp crypto/siphash.cpp chacha20_x64_gas.o crypto/aesgcm/aesni_gcm_x64_gas.s \
crypto/aesgcm/aesni_x64_gas.s crypto/aesgcm/aesgcm.cpp poly1305_x64_gas.o ipzip2/ipzip2.cpp \
crypto/aesgcm/ghash_x64_gas.s

17
build_osx.sh Normal file
View file

@ -0,0 +1,17 @@
set -e
clang++ -c -mavx512f -mavx512vl crypto/poly1305_x64_gas_macosx.s crypto/chacha20_x64_gas_macosx.s
clang++ -g -O3 -I . -std=c++11 -DNDEBUG=1 -fno-exceptions -fno-rtti -ffunction-sections -o tunsafe \
wireguard_config.cpp wireguard.cpp wireguard_proto.cpp util.cpp network_bsd_mt.cpp benchmark.cpp tunsafe_cpu.cpp \
crypto/blake2s.cpp crypto/blake2s_sse.cpp crypto/chacha20poly1305.cpp crypto/curve25519-donna.cpp \
crypto/siphash.cpp crypto/aesgcm/aesgcm.cpp ipzip2/ipzip2.cpp \
crypto/aesgcm/aesni_gcm_x64_gas_macosx.s crypto/aesgcm/aesni_x64_gas_macosx.s crypto/aesgcm/ghash_x64_gas_macosx.s \
chacha20_x64_gas_macosx.o poly1305_x64_gas_macosx.o
cp tunsafe tunsafe.unstripped
strip tunsafe
rm -f tunsafe_osx.zip
zip tunsafe_osx.zip tunsafe readme_osx.txt

1
crypto/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
/old/

84
crypto/aesgcm/aes.h Normal file
View file

@ -0,0 +1,84 @@
/**
* Downloaded from
*
* http://www.esat.kuleuven.ac.be/~rijmen/rijndael/rijndael-fst-3.0.zip
*
* rijndael-alg-fst.h
*
* @version 3.0 (December 2000)
*
* Optimised ANSI C code for the Rijndael cipher (now AES)
*
* @author Vincent Rijmen <vincent.rijmen@esat.kuleuven.ac.be>
* @author Antoon Bosselaers <antoon.bosselaers@esat.kuleuven.ac.be>
* @author Paulo Barreto <paulo.barreto@terra.com.br>
*
* This code is hereby placed in the public domain.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS
* OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
* BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
* OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
* EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __RIJNDAEL_ALG_FST_H
#define __RIJNDAEL_ALG_FST_H
#include "tunsafe_types.h"
#define AESGCM_MAXNR 14
struct AesContext {
uint32 rk[(AESGCM_MAXNR + 1) * 4];
int rounds;
};
typedef struct { uint64 hi, lo; } aesgcm_u128;
struct AesGcm128StaticContext {
void(*gmult)(uint64 Xi[2], const aesgcm_u128 Htable[16]);
void(*ghash)(uint64 Xi[2], const aesgcm_u128 Htable[16], const uint8 *inp, size_t len);
bool use_aesni_gcm_crypt;
// Don't move H and Htable cause the asm code depends on them
union { uint64 u[2]; uint32 d[4]; uint8 c[16]; size_t t[16 / sizeof(size_t)]; } H;
aesgcm_u128 Htable[16];
AesContext aes;
};
struct AesGcm128TempContext {
AesGcm128StaticContext *sctx;
union { uint64 u[2]; uint32 d[4]; uint8 c[16]; size_t t[16/sizeof(size_t)]; } EKi,EK0,len, Yi, Xi;
unsigned int mres, ares;
};
void CRYPTO_gcm128_init(AesGcm128StaticContext *ctx, const uint8 *key, int key_size);
void CRYPTO_gcm128_setiv(AesGcm128TempContext *ctx, AesGcm128StaticContext *sctx, const unsigned char *iv,size_t len);
void CRYPTO_gcm128_aad(AesGcm128TempContext *ctx,const uint8 *aad,size_t len);
void CRYPTO_gcm128_encrypt_ctr32(AesGcm128TempContext *ctx, const uint8 *in, uint8 *out, size_t len);
void CRYPTO_gcm128_decrypt_ctr32(AesGcm128TempContext *ctx, const uint8 *in, uint8 *out, size_t len);
void CRYPTO_gcm128_finish(AesGcm128TempContext *ctx, unsigned char *tag, size_t len);
void aesgcm_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, AesGcm128StaticContext *sctx);
void aesgcm_decrypt_get_mac(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, AesGcm128StaticContext *sctx,
uint8 mac[16]);
#if defined(ARCH_CPU_X86_64)
#define WITH_AESGCM 0
#endif
#endif /* __RIJNDAEL_ALG_FST_H */

882
crypto/aesgcm/aesgcm.cpp Normal file
View file

@ -0,0 +1,882 @@
#include "stdafx.h"
#include "tunsafe_types.h"
#include "tunsafe_endian.h"
#include "tunsafe_cpu.h"
#include "crypto/aesgcm/aes.h"
#include <assert.h>
#include <string.h>
#include <stdio.h>
//#include <Windows.h>
#include "crypto/chacha20poly1305.h"
#define AESNIGCM_ASM 1
#define AESGCM_ASM 1
#define AESNI_GCM 1
// We only implement AES stuff on X86-64
#if WITH_AESGCM
extern "C" {
void gcm_init_clmul(aesgcm_u128 Htable[16],const uint64 Xi[2]);
void gcm_gmult_clmul(uint64 Xi[2],const aesgcm_u128 Htable[16]);
void gcm_ghash_clmul(uint64 Xi[2],const aesgcm_u128 Htable[16],const uint8 *inp,size_t len);
void gcm_init_avx(aesgcm_u128 Htable[16],const uint64 Xi[2]);
void gcm_gmult_avx(uint64 Xi[2],const aesgcm_u128 Htable[16]);
void gcm_ghash_avx(uint64 Xi[2],const aesgcm_u128 Htable[16],const uint8 *inp,size_t len);
void gcm_gmult_4bit(uint64 Xi[2], const aesgcm_u128 Htable[16]);
void gcm_ghash_4bit(uint64 Xi[2],const aesgcm_u128 Htable[16], const uint8 *inp,size_t len);
// ivec points to Yi followed by Xi
// h_and_htable points at h and htable from the static context
size_t aesni_gcm_encrypt(const uint8 *in,uint8 *out,size_t len,const void *key,uint8 ivec_and_xi[16],uint64 *h_and_htable);
size_t aesni_gcm_decrypt(const uint8 *in,uint8 *out,size_t len,const void *key,uint8 ivec_and_xi[16],uint64 *h_and_htable);
void aesni_ctr32_encrypt_blocks(const void *in, void *out, size_t blocks, const AesContext *key, const uint8 *ivec);
void aesni_encrypt(const void *inp, void *out, const AesContext *key);
void aesni_decrypt(const void *inp, void *out, const AesContext *key);
int aesni_set_encrypt_key(const unsigned char *inp, int bits, AesContext *key);
int aesni_set_decrypt_key(const unsigned char *inp, int bits, AesContext *key);
};
#define GCM_MUL(ctx,Xi) (*gcm_gmult_p)(ctx->Xi.u,sctx->Htable)
#define GHASH(ctx,in,len) (*gcm_ghash_p)(ctx->Xi.u,sctx->Htable,in,len)
#define GHASH_CHUNK (3*1024)
void CRYPTO_gcm128_aad(AesGcm128TempContext *ctx,const uint8 *aad,size_t len) {
size_t i;
unsigned int n;
AesGcm128StaticContext *sctx = ctx->sctx;
uint64 alen = ctx->len.u[0];
void (*gcm_gmult_p)(uint64 Xi[2],const aesgcm_u128 Htable[16]) = sctx->gmult;
void (*gcm_ghash_p)(uint64 Xi[2],const aesgcm_u128 Htable[16], const uint8 *inp,size_t len) = sctx->ghash;
assert(!ctx->len.u[1]);
// if () return -2;
alen += len;
// if (alen>(uint64(1)<<61) || (sizeof(len)==8 && alen<len))
// return -1;
ctx->len.u[0] = alen;
n = ctx->ares;
if (n) {
while (n && len) {
ctx->Xi.c[n] ^= *(aad++);
--len;
n = (n+1)%16;
}
if (n==0) GCM_MUL(ctx,Xi);
else {
ctx->ares = n;
return;
}
}
#ifdef GHASH
if ((i = (len&(size_t)-16))) {
GHASH(ctx,aad,i);
aad += i;
len -= i;
}
#else
while (len>=16) {
for (i=0; i<16; ++i) ctx->Xi.c[i] ^= aad[i];
GCM_MUL(ctx,Xi);
aad += 16;
len -= 16;
}
#endif
if (len) {
n = (unsigned int)len;
for (i=0; i<len; ++i) ctx->Xi.c[i] ^= aad[i];
}
ctx->ares = n;
}
void CRYPTO_gcm128_encrypt_ctr32(AesGcm128TempContext *ctx, const uint8 *in, uint8 *out, size_t len) {
unsigned int n, ctr;
size_t i;
AesGcm128StaticContext *sctx = ctx->sctx;
uint64 mlen = ctx->len.u[1];
void (*gcm_gmult_p)(uint64 Xi[2],const aesgcm_u128 Htable[16]) = sctx->gmult;
void (*gcm_ghash_p)(uint64 Xi[2],const aesgcm_u128 Htable[16], const uint8 *inp,size_t len) = sctx->ghash;
mlen += len;
// if (mlen>((uint64(1)<<36)-32) || (sizeof(len)==8 && mlen<len))
// return -1;
ctx->len.u[1] = mlen;
if (ctx->ares) {
/* First call to encrypt finalizes GHASH(AAD) */
GCM_MUL(ctx,Xi);
ctx->ares = 0;
}
n = ctx->mres;
if (n) {
while (n && len) {
ctx->Xi.c[n] ^= *(out++) = *(in++)^ctx->EKi.c[n];
--len;
n = (n+1)%16;
}
if (n==0) GCM_MUL(ctx,Xi);
else {
ctx->mres = n;
return;
}
}
#if defined(AESNI_GCM)
if (sctx->use_aesni_gcm_crypt && len >= 0x120) {
// |aesni_gcm_encrypt| may not process all the input given to it. It may
// not process *any* of its input if it is deemed too small.
size_t bulk = aesni_gcm_encrypt(in, out, len, &sctx->aes, ctx->Yi.c, sctx->H.u);
in += bulk;
out += bulk;
len -= bulk;
}
#endif
ctr = ReadBE32(ctx->Yi.c + 12);
#if defined(STRICT_ALIGNMENT)
if (((size_t)in | (size_t)out) % sizeof(size_t) != 0) {
for (i = 0; i<len; ++i) {
if (n == 0) {
aesni_encrypt(ctx->Yi.c, ctx->EKi.c, &sctx->aes);
++ctr;
WriteBE32(ctx->Yi.c + 12, ctr);
}
ctx->Xi.c[n] ^= out[i] = in[i] ^ ctx->EKi.c[n];
n = (n + 1) % 16;
if (n == 0)
GCM_MUL(ctx, Xi);
}
ctx->mres = n;
return;
}
#endif
while (len>=GHASH_CHUNK) {
aesni_ctr32_encrypt_blocks(in, out, GHASH_CHUNK / 16, &sctx->aes, ctx->Yi.c);
GHASH(ctx, out, GHASH_CHUNK);
ctr += GHASH_CHUNK / 16;
WriteBE32(ctx->Yi.c + 12, ctr);
in += GHASH_CHUNK;
out += GHASH_CHUNK;
len -= GHASH_CHUNK;
}
if ((i = (len&(size_t)-16))) {
aesni_ctr32_encrypt_blocks(in, out, i / 16, &sctx->aes, ctx->Yi.c);
GHASH(ctx, out, i);
ctr += (uint32)(i / 16);
WriteBE32(ctx->Yi.c + 12, ctr);
out += i;
in += i;
len -= i;
}
if (len) {
aesni_encrypt(ctx->Yi.c, ctx->EKi.c, &sctx->aes);
++ctr;
WriteBE32(ctx->Yi.c+12,ctr);
while (len--) {
ctx->Xi.c[n] ^= out[n] = in[n] ^ ctx->EKi.c[n];
++n;
}
}
ctx->mres = n;
}
void CRYPTO_gcm128_decrypt_ctr32(AesGcm128TempContext *ctx, const uint8 *in, uint8 *out, size_t len) {
unsigned int n, ctr;
size_t i;
uint64 mlen = ctx->len.u[1];
AesGcm128StaticContext *sctx = ctx->sctx;
void (*gcm_gmult_p)(uint64 Xi[2],const aesgcm_u128 Htable[16]) = sctx->gmult;
void (*gcm_ghash_p)(uint64 Xi[2],const aesgcm_u128 Htable[16], const uint8 *inp,size_t len) = sctx->ghash;
mlen += len;
// if (mlen>((uint64(1)<<36)-32) || (sizeof(len)==8 && mlen<len))
// return -1;
ctx->len.u[1] = mlen;
if (ctx->ares) {
/* First call to decrypt finalizes GHASH(AAD) */
GCM_MUL(ctx,Xi);
ctx->ares = 0;
}
n = ctx->mres;
if (n) {
while (n && len) {
uint8 c = *(in++);
*(out++) = c^ctx->EKi.c[n];
ctx->Xi.c[n] ^= c;
--len;
n = (n+1)%16;
}
if (n==0) GCM_MUL (ctx,Xi);
else {
ctx->mres = n;
return;
}
}
#if defined(AESNI_GCM)
if (sctx->use_aesni_gcm_crypt) {
// |aesni_gcm_decrypt| may not process all the input given to it. It may
// not process *any* of its input if it is deemed too small.
size_t bulk = aesni_gcm_decrypt(in, out, len, &sctx->aes, ctx->Yi.c, sctx->H.u);
in += bulk;
out += bulk;
len -= bulk;
}
#endif
ctr = ReadBE32(ctx->Yi.c + 12);
#if defined(STRICT_ALIGNMENT)
if (((size_t)in|(size_t)out)%sizeof(size_t) != 0) {
for (i=0;i<len;++i) {
uint8 c;
if (n==0) {
aesni_encrypt(ctx->Yi.c, ctx->EKi.c, key);
++ctr;
WriteBE32(ctx->Yi.c+12,ctr);
}
c = in[i];
out[i] = c^ctx->EKi.c[n];
ctx->Xi.c[n] ^= c;
n = (n+1)%16;
if (n==0)
GCM_MUL(ctx,Xi);
}
ctx->mres = n;
return;
}
#endif
while (len >= GHASH_CHUNK) {
GHASH(ctx, in, GHASH_CHUNK);
aesni_ctr32_encrypt_blocks(in, out, GHASH_CHUNK / 16, &sctx->aes, ctx->Yi.c);
ctr += GHASH_CHUNK / 16;
WriteBE32(ctx->Yi.c + 12, ctr);
in += GHASH_CHUNK;
out += GHASH_CHUNK;
len -= GHASH_CHUNK;
}
if ((i = (len&(size_t)-16))) {
GHASH(ctx, in, i);
aesni_ctr32_encrypt_blocks(in, out, i / 16, &sctx->aes, ctx->Yi.c);
ctr += (uint32)(i / 16);
WriteBE32(ctx->Yi.c + 12, ctr);
out += i;
in += i;
len -= i;
}
if (len) {
aesni_encrypt(ctx->Yi.c, ctx->EKi.c, &sctx->aes);
++ctr;
WriteBE32(ctx->Yi.c+12,ctr);
while (len--) {
uint8 c = in[n];
ctx->Xi.c[n] ^= c;
out[n] = c^ctx->EKi.c[n];
++n;
}
}
ctx->mres = n;
}
void CRYPTO_gcm128_finish(AesGcm128TempContext *ctx,uint8 *tag, size_t len) {
uint64 alen = ctx->len.u[0]<<3;
uint64 clen = ctx->len.u[1]<<3;
AesGcm128StaticContext *sctx = ctx->sctx;
void (*gcm_gmult_p)(uint64 Xi[2],const aesgcm_u128 Htable[16]) = sctx->gmult;
if (ctx->mres || ctx->ares)
GCM_MUL(ctx,Xi);
alen = ToBE64(alen);
clen = ToBE64(clen);
ctx->Xi.u[0] ^= alen;
ctx->Xi.u[1] ^= clen;
GCM_MUL(ctx,Xi);
ctx->Xi.u[0] ^= ctx->EK0.u[0];
ctx->Xi.u[1] ^= ctx->EK0.u[1];
memcpy(tag, ctx->Xi.c,len);
}
#define REDUCE1BIT(V) do { \
if (sizeof(size_t)==8) { \
uint64 T = 0xe100000000000000ull & (0-(V.lo&1)); \
V.lo = (V.hi<<63)|(V.lo>>1); \
V.hi = (V.hi>>1 )^T; \
} else { \
uint32 T = 0xe1000000U & (0-(uint32)(V.lo&1)); \
V.lo = (V.hi<<63)|(V.lo>>1); \
V.hi = (V.hi>>1 )^((uint64)T<<32); \
} \
} while(0)
static void gcm_init_4bit(aesgcm_u128 Htable[16], uint64 H[2]) {
aesgcm_u128 V;
Htable[0].hi = 0;
Htable[0].lo = 0;
V.hi = H[0];
V.lo = H[1];
Htable[8] = V;
REDUCE1BIT(V);
Htable[4] = V;
REDUCE1BIT(V);
Htable[2] = V;
REDUCE1BIT(V);
Htable[1] = V;
Htable[3].hi = V.hi^Htable[2].hi, Htable[3].lo = V.lo^Htable[2].lo;
V=Htable[4];
Htable[5].hi = V.hi^Htable[1].hi, Htable[5].lo = V.lo^Htable[1].lo;
Htable[6].hi = V.hi^Htable[2].hi, Htable[6].lo = V.lo^Htable[2].lo;
Htable[7].hi = V.hi^Htable[3].hi, Htable[7].lo = V.lo^Htable[3].lo;
V=Htable[8];
Htable[9].hi = V.hi^Htable[1].hi, Htable[9].lo = V.lo^Htable[1].lo;
Htable[10].hi = V.hi^Htable[2].hi, Htable[10].lo = V.lo^Htable[2].lo;
Htable[11].hi = V.hi^Htable[3].hi, Htable[11].lo = V.lo^Htable[3].lo;
Htable[12].hi = V.hi^Htable[4].hi, Htable[12].lo = V.lo^Htable[4].lo;
Htable[13].hi = V.hi^Htable[5].hi, Htable[13].lo = V.lo^Htable[5].lo;
Htable[14].hi = V.hi^Htable[6].hi, Htable[14].lo = V.lo^Htable[6].lo;
Htable[15].hi = V.hi^Htable[7].hi, Htable[15].lo = V.lo^Htable[7].lo;
}
#if !AESGCM_ASM
#define PACK(s) ((size_t)(s)<<(sizeof(size_t)*8-16))
static const size_t rem_4bit[16] = {
PACK(0x0000), PACK(0x1C20), PACK(0x3840), PACK(0x2460),
PACK(0x7080), PACK(0x6CA0), PACK(0x48C0), PACK(0x54E0),
PACK(0xE100), PACK(0xFD20), PACK(0xD940), PACK(0xC560),
PACK(0x9180), PACK(0x8DA0), PACK(0xA9C0), PACK(0xB5E0)};
void gcm_gmult_4bit(uint64 Xi[2], const aesgcm_u128 Htable[16]) {
aesgcm_u128 Z;
int cnt = 15;
size_t rem, nlo, nhi;
const union { long one; char little; } is_endian = {1};
nlo = ((const uint8 *)Xi)[15];
nhi = nlo>>4;
nlo &= 0xf;
Z.hi = Htable[nlo].hi;
Z.lo = Htable[nlo].lo;
while (1) {
rem = (size_t)Z.lo&0xf;
Z.lo = (Z.hi<<60)|(Z.lo>>4);
Z.hi = (Z.hi>>4);
if (sizeof(size_t)==8)
Z.hi ^= rem_4bit[rem];
else
Z.hi ^= (uint64)rem_4bit[rem]<<32;
Z.hi ^= Htable[nhi].hi;
Z.lo ^= Htable[nhi].lo;
if (--cnt<0) break;
nlo = ((const uint8 *)Xi)[cnt];
nhi = nlo>>4;
nlo &= 0xf;
rem = (size_t)Z.lo&0xf;
Z.lo = (Z.hi<<60)|(Z.lo>>4);
Z.hi = (Z.hi>>4);
if (sizeof(size_t)==8)
Z.hi ^= rem_4bit[rem];
else
Z.hi ^= (uint64)rem_4bit[rem]<<32;
Z.hi ^= Htable[nlo].hi;
Z.lo ^= Htable[nlo].lo;
}
Xi[0] = ToBE64(Z.hi);
Xi[1] = ToBE64(Z.lo);
}
void gcm_ghash_4bit(uint64 Xi[2],const aesgcm_u128 Htable[16], const uint8 *inp,size_t len) {
aesgcm_u128 Z;
int cnt;
size_t rem, nlo, nhi;
do {
cnt = 15;
nlo = ((const uint8 *)Xi)[15];
nlo ^= inp[15];
nhi = nlo>>4;
nlo &= 0xf;
Z.hi = Htable[nlo].hi;
Z.lo = Htable[nlo].lo;
while (1) {
rem = (size_t)Z.lo&0xf;
Z.lo = (Z.hi<<60)|(Z.lo>>4);
Z.hi = (Z.hi>>4);
if (sizeof(size_t)==8)
Z.hi ^= rem_4bit[rem];
else
Z.hi ^= (uint64)rem_4bit[rem]<<32;
Z.hi ^= Htable[nhi].hi;
Z.lo ^= Htable[nhi].lo;
if (--cnt<0) break;
nlo = ((const uint8 *)Xi)[cnt];
nlo ^= inp[cnt];
nhi = nlo>>4;
nlo &= 0xf;
rem = (size_t)Z.lo&0xf;
Z.lo = (Z.hi<<60)|(Z.lo>>4);
Z.hi = (Z.hi>>4);
if (sizeof(size_t)==8)
Z.hi ^= rem_4bit[rem];
else
Z.hi ^= (uint64)rem_4bit[rem]<<32;
Z.hi ^= Htable[nlo].hi;
Z.lo ^= Htable[nlo].lo;
}
Xi[0] = ToBE64(Z.hi);
Xi[1] = ToBE64(Z.lo);
} while (inp+=16, len-=16);
}
#endif
void CRYPTO_gcm128_init(AesGcm128StaticContext *ctx, const uint8 *key, int key_size) {
memset(ctx,0,sizeof(*ctx));
ctx->use_aesni_gcm_crypt = X86_PCAP_MOVBE;
aesni_set_encrypt_key(key, key_size, &ctx->aes);
aesni_encrypt(ctx->H.c,ctx->H.c, &ctx->aes);
ctx->H.u[0] = ToBE64(ctx->H.u[0]);
ctx->H.u[1] = ToBE64(ctx->H.u[1]);
if (X86_PCAP_AVX) {
gcm_init_avx(ctx->Htable,ctx->H.u);
ctx->gmult = gcm_gmult_avx;
ctx->ghash = gcm_ghash_avx;
} else if (X86_PCAP_PCLMULQDQ) {
gcm_init_clmul(ctx->Htable,ctx->H.u);
ctx->gmult = gcm_gmult_clmul;
ctx->ghash = gcm_ghash_clmul;
} else {
gcm_init_4bit(ctx->Htable, ctx->H.u);
ctx->gmult = gcm_gmult_4bit;
ctx->ghash = gcm_ghash_4bit;
}
}
void CRYPTO_gcm128_setiv(AesGcm128TempContext *ctx, AesGcm128StaticContext *sctx, const unsigned char *iv, size_t len) {
unsigned int ctr;
void (*gcm_gmult_p)(uint64 Xi[2],const aesgcm_u128 Htable[16]) = sctx->gmult;
ctx->sctx = sctx;
ctx->Yi.u[0] = 0;
ctx->Yi.u[1] = 0;
ctx->Xi.u[0] = 0;
ctx->Xi.u[1] = 0;
ctx->len.u[0] = 0; /* AAD length */
ctx->len.u[1] = 0; /* message length */
ctx->ares = 0;
ctx->mres = 0;
if (len==12) {
memcpy(ctx->Yi.c,iv,12);
ctx->Yi.c[15]=1;
ctr=1;
} else {
size_t i;
uint64 len0 = len;
while (len>=16) {
for (i=0; i<16; ++i) ctx->Yi.c[i] ^= iv[i];
GCM_MUL(ctx,Yi);
iv += 16;
len -= 16;
}
if (len) {
for (i=0; i<len; ++i) ctx->Yi.c[i] ^= iv[i];
GCM_MUL(ctx,Yi);
}
len0 <<= 3;
ctx->Yi.u[1] ^= ToBE64(len0);
GCM_MUL(ctx,Yi);
ctr = ToBE32(ctx->Yi.d[3]);
}
aesni_encrypt(ctx->Yi.c, ctx->EK0.c, &sctx->aes);
++ctr;
ctx->Yi.d[3] = ToBE32(ctr);
}
union AesGcmIV {
uint32 nonce[3];
uint8 nonceb[12];
};
void aesgcm_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, AesGcm128StaticContext *sctx) {
AesGcm128TempContext ctx;
AesGcmIV iv;
WriteLE64(iv.nonce, nonce);
iv.nonce[2] = 0;
CRYPTO_gcm128_setiv(&ctx, sctx, iv.nonceb, sizeof(iv));
CRYPTO_gcm128_aad(&ctx, ad, ad_len);
CRYPTO_gcm128_encrypt_ctr32(&ctx, src, dst, src_len);
CRYPTO_gcm128_finish(&ctx, dst + src_len, 16);
}
void aesgcm_decrypt_get_mac(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, AesGcm128StaticContext *sctx,
uint8 mac[16]) {
AesGcm128TempContext ctx;
AesGcmIV iv;
WriteLE64(iv.nonce, nonce);
iv.nonce[2] = 0;
CRYPTO_gcm128_setiv(&ctx, sctx, iv.nonceb, sizeof(iv));
CRYPTO_gcm128_aad(&ctx, ad, ad_len);
CRYPTO_gcm128_decrypt_ctr32(&ctx, src, dst, src_len);
CRYPTO_gcm128_finish(&ctx, mac, 16);
}
#if 1
/*
* GCM test vectors from:
*
* http://csrc.nist.gov/groups/STM/cavp/documents/mac/gcmtestvectors.zip
*/
#define MAX_TESTS 6
static int key_index[MAX_TESTS] =
{ 0, 0, 1, 1, 1, 1 };
static uint8 key[MAX_TESTS][32] =
{
{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
{ 0xfe, 0xff, 0xe9, 0x92, 0x86, 0x65, 0x73, 0x1c,
0x6d, 0x6a, 0x8f, 0x94, 0x67, 0x30, 0x83, 0x08,
0xfe, 0xff, 0xe9, 0x92, 0x86, 0x65, 0x73, 0x1c,
0x6d, 0x6a, 0x8f, 0x94, 0x67, 0x30, 0x83, 0x08 },
};
static size_t iv_len[MAX_TESTS] =
{ 12, 12, 12, 12, 8, 60 };
static int iv_index[MAX_TESTS] =
{ 0, 0, 1, 1, 1, 2 };
static uint8 iv[MAX_TESTS][64] =
{
{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00 },
{ 0xca, 0xfe, 0xba, 0xbe, 0xfa, 0xce, 0xdb, 0xad,
0xde, 0xca, 0xf8, 0x88 },
{ 0x93, 0x13, 0x22, 0x5d, 0xf8, 0x84, 0x06, 0xe5,
0x55, 0x90, 0x9c, 0x5a, 0xff, 0x52, 0x69, 0xaa,
0x6a, 0x7a, 0x95, 0x38, 0x53, 0x4f, 0x7d, 0xa1,
0xe4, 0xc3, 0x03, 0xd2, 0xa3, 0x18, 0xa7, 0x28,
0xc3, 0xc0, 0xc9, 0x51, 0x56, 0x80, 0x95, 0x39,
0xfc, 0xf0, 0xe2, 0x42, 0x9a, 0x6b, 0x52, 0x54,
0x16, 0xae, 0xdb, 0xf5, 0xa0, 0xde, 0x6a, 0x57,
0xa6, 0x37, 0xb3, 0x9b },
};
static size_t add_len[MAX_TESTS] =
{ 0, 0, 0, 20, 20, 20 };
int add_index[MAX_TESTS] =
{ 0, 0, 0, 1, 1, 1 };
static uint8 additional[MAX_TESTS][64] =
{
{ 0x00 },
{ 0xfe, 0xed, 0xfa, 0xce, 0xde, 0xad, 0xbe, 0xef,
0xfe, 0xed, 0xfa, 0xce, 0xde, 0xad, 0xbe, 0xef,
0xab, 0xad, 0xda, 0xd2 },
};
static size_t pt_len[MAX_TESTS] =
{ 0, 16, 64, 60, 60, 60 };
static int pt_index[MAX_TESTS] =
{ 0, 0, 1, 1, 1, 1 };
static uint8 pt[MAX_TESTS][64] =
{
{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
{ 0xd9, 0x31, 0x32, 0x25, 0xf8, 0x84, 0x06, 0xe5,
0xa5, 0x59, 0x09, 0xc5, 0xaf, 0xf5, 0x26, 0x9a,
0x86, 0xa7, 0xa9, 0x53, 0x15, 0x34, 0xf7, 0xda,
0x2e, 0x4c, 0x30, 0x3d, 0x8a, 0x31, 0x8a, 0x72,
0x1c, 0x3c, 0x0c, 0x95, 0x95, 0x68, 0x09, 0x53,
0x2f, 0xcf, 0x0e, 0x24, 0x49, 0xa6, 0xb5, 0x25,
0xb1, 0x6a, 0xed, 0xf5, 0xaa, 0x0d, 0xe6, 0x57,
0xba, 0x63, 0x7b, 0x39, 0x1a, 0xaf, 0xd2, 0x55 },
};
static uint8 ct[MAX_TESTS * 3][64] =
{
{ 0x00 },
{ 0x03, 0x88, 0xda, 0xce, 0x60, 0xb6, 0xa3, 0x92,
0xf3, 0x28, 0xc2, 0xb9, 0x71, 0xb2, 0xfe, 0x78 },
{ 0x42, 0x83, 0x1e, 0xc2, 0x21, 0x77, 0x74, 0x24,
0x4b, 0x72, 0x21, 0xb7, 0x84, 0xd0, 0xd4, 0x9c,
0xe3, 0xaa, 0x21, 0x2f, 0x2c, 0x02, 0xa4, 0xe0,
0x35, 0xc1, 0x7e, 0x23, 0x29, 0xac, 0xa1, 0x2e,
0x21, 0xd5, 0x14, 0xb2, 0x54, 0x66, 0x93, 0x1c,
0x7d, 0x8f, 0x6a, 0x5a, 0xac, 0x84, 0xaa, 0x05,
0x1b, 0xa3, 0x0b, 0x39, 0x6a, 0x0a, 0xac, 0x97,
0x3d, 0x58, 0xe0, 0x91, 0x47, 0x3f, 0x59, 0x85 },
{ 0x42, 0x83, 0x1e, 0xc2, 0x21, 0x77, 0x74, 0x24,
0x4b, 0x72, 0x21, 0xb7, 0x84, 0xd0, 0xd4, 0x9c,
0xe3, 0xaa, 0x21, 0x2f, 0x2c, 0x02, 0xa4, 0xe0,
0x35, 0xc1, 0x7e, 0x23, 0x29, 0xac, 0xa1, 0x2e,
0x21, 0xd5, 0x14, 0xb2, 0x54, 0x66, 0x93, 0x1c,
0x7d, 0x8f, 0x6a, 0x5a, 0xac, 0x84, 0xaa, 0x05,
0x1b, 0xa3, 0x0b, 0x39, 0x6a, 0x0a, 0xac, 0x97,
0x3d, 0x58, 0xe0, 0x91 },
{ 0x61, 0x35, 0x3b, 0x4c, 0x28, 0x06, 0x93, 0x4a,
0x77, 0x7f, 0xf5, 0x1f, 0xa2, 0x2a, 0x47, 0x55,
0x69, 0x9b, 0x2a, 0x71, 0x4f, 0xcd, 0xc6, 0xf8,
0x37, 0x66, 0xe5, 0xf9, 0x7b, 0x6c, 0x74, 0x23,
0x73, 0x80, 0x69, 0x00, 0xe4, 0x9f, 0x24, 0xb2,
0x2b, 0x09, 0x75, 0x44, 0xd4, 0x89, 0x6b, 0x42,
0x49, 0x89, 0xb5, 0xe1, 0xeb, 0xac, 0x0f, 0x07,
0xc2, 0x3f, 0x45, 0x98 },
{ 0x8c, 0xe2, 0x49, 0x98, 0x62, 0x56, 0x15, 0xb6,
0x03, 0xa0, 0x33, 0xac, 0xa1, 0x3f, 0xb8, 0x94,
0xbe, 0x91, 0x12, 0xa5, 0xc3, 0xa2, 0x11, 0xa8,
0xba, 0x26, 0x2a, 0x3c, 0xca, 0x7e, 0x2c, 0xa7,
0x01, 0xe4, 0xa9, 0xa4, 0xfb, 0xa4, 0x3c, 0x90,
0xcc, 0xdc, 0xb2, 0x81, 0xd4, 0x8c, 0x7c, 0x6f,
0xd6, 0x28, 0x75, 0xd2, 0xac, 0xa4, 0x17, 0x03,
0x4c, 0x34, 0xae, 0xe5 },
{ 0x00 },
{ 0x98, 0xe7, 0x24, 0x7c, 0x07, 0xf0, 0xfe, 0x41,
0x1c, 0x26, 0x7e, 0x43, 0x84, 0xb0, 0xf6, 0x00 },
{ 0x39, 0x80, 0xca, 0x0b, 0x3c, 0x00, 0xe8, 0x41,
0xeb, 0x06, 0xfa, 0xc4, 0x87, 0x2a, 0x27, 0x57,
0x85, 0x9e, 0x1c, 0xea, 0xa6, 0xef, 0xd9, 0x84,
0x62, 0x85, 0x93, 0xb4, 0x0c, 0xa1, 0xe1, 0x9c,
0x7d, 0x77, 0x3d, 0x00, 0xc1, 0x44, 0xc5, 0x25,
0xac, 0x61, 0x9d, 0x18, 0xc8, 0x4a, 0x3f, 0x47,
0x18, 0xe2, 0x44, 0x8b, 0x2f, 0xe3, 0x24, 0xd9,
0xcc, 0xda, 0x27, 0x10, 0xac, 0xad, 0xe2, 0x56 },
{ 0x39, 0x80, 0xca, 0x0b, 0x3c, 0x00, 0xe8, 0x41,
0xeb, 0x06, 0xfa, 0xc4, 0x87, 0x2a, 0x27, 0x57,
0x85, 0x9e, 0x1c, 0xea, 0xa6, 0xef, 0xd9, 0x84,
0x62, 0x85, 0x93, 0xb4, 0x0c, 0xa1, 0xe1, 0x9c,
0x7d, 0x77, 0x3d, 0x00, 0xc1, 0x44, 0xc5, 0x25,
0xac, 0x61, 0x9d, 0x18, 0xc8, 0x4a, 0x3f, 0x47,
0x18, 0xe2, 0x44, 0x8b, 0x2f, 0xe3, 0x24, 0xd9,
0xcc, 0xda, 0x27, 0x10 },
{ 0x0f, 0x10, 0xf5, 0x99, 0xae, 0x14, 0xa1, 0x54,
0xed, 0x24, 0xb3, 0x6e, 0x25, 0x32, 0x4d, 0xb8,
0xc5, 0x66, 0x63, 0x2e, 0xf2, 0xbb, 0xb3, 0x4f,
0x83, 0x47, 0x28, 0x0f, 0xc4, 0x50, 0x70, 0x57,
0xfd, 0xdc, 0x29, 0xdf, 0x9a, 0x47, 0x1f, 0x75,
0xc6, 0x65, 0x41, 0xd4, 0xd4, 0xda, 0xd1, 0xc9,
0xe9, 0x3a, 0x19, 0xa5, 0x8e, 0x8b, 0x47, 0x3f,
0xa0, 0xf0, 0x62, 0xf7 },
{ 0xd2, 0x7e, 0x88, 0x68, 0x1c, 0xe3, 0x24, 0x3c,
0x48, 0x30, 0x16, 0x5a, 0x8f, 0xdc, 0xf9, 0xff,
0x1d, 0xe9, 0xa1, 0xd8, 0xe6, 0xb4, 0x47, 0xef,
0x6e, 0xf7, 0xb7, 0x98, 0x28, 0x66, 0x6e, 0x45,
0x81, 0xe7, 0x90, 0x12, 0xaf, 0x34, 0xdd, 0xd9,
0xe2, 0xf0, 0x37, 0x58, 0x9b, 0x29, 0x2d, 0xb3,
0xe6, 0x7c, 0x03, 0x67, 0x45, 0xfa, 0x22, 0xe7,
0xe9, 0xb7, 0x37, 0x3b },
{ 0x00 },
{ 0xce, 0xa7, 0x40, 0x3d, 0x4d, 0x60, 0x6b, 0x6e,
0x07, 0x4e, 0xc5, 0xd3, 0xba, 0xf3, 0x9d, 0x18 },
{ 0x52, 0x2d, 0xc1, 0xf0, 0x99, 0x56, 0x7d, 0x07,
0xf4, 0x7f, 0x37, 0xa3, 0x2a, 0x84, 0x42, 0x7d,
0x64, 0x3a, 0x8c, 0xdc, 0xbf, 0xe5, 0xc0, 0xc9,
0x75, 0x98, 0xa2, 0xbd, 0x25, 0x55, 0xd1, 0xaa,
0x8c, 0xb0, 0x8e, 0x48, 0x59, 0x0d, 0xbb, 0x3d,
0xa7, 0xb0, 0x8b, 0x10, 0x56, 0x82, 0x88, 0x38,
0xc5, 0xf6, 0x1e, 0x63, 0x93, 0xba, 0x7a, 0x0a,
0xbc, 0xc9, 0xf6, 0x62, 0x89, 0x80, 0x15, 0xad },
{ 0x52, 0x2d, 0xc1, 0xf0, 0x99, 0x56, 0x7d, 0x07,
0xf4, 0x7f, 0x37, 0xa3, 0x2a, 0x84, 0x42, 0x7d,
0x64, 0x3a, 0x8c, 0xdc, 0xbf, 0xe5, 0xc0, 0xc9,
0x75, 0x98, 0xa2, 0xbd, 0x25, 0x55, 0xd1, 0xaa,
0x8c, 0xb0, 0x8e, 0x48, 0x59, 0x0d, 0xbb, 0x3d,
0xa7, 0xb0, 0x8b, 0x10, 0x56, 0x82, 0x88, 0x38,
0xc5, 0xf6, 0x1e, 0x63, 0x93, 0xba, 0x7a, 0x0a,
0xbc, 0xc9, 0xf6, 0x62 },
{ 0xc3, 0x76, 0x2d, 0xf1, 0xca, 0x78, 0x7d, 0x32,
0xae, 0x47, 0xc1, 0x3b, 0xf1, 0x98, 0x44, 0xcb,
0xaf, 0x1a, 0xe1, 0x4d, 0x0b, 0x97, 0x6a, 0xfa,
0xc5, 0x2f, 0xf7, 0xd7, 0x9b, 0xba, 0x9d, 0xe0,
0xfe, 0xb5, 0x82, 0xd3, 0x39, 0x34, 0xa4, 0xf0,
0x95, 0x4c, 0xc2, 0x36, 0x3b, 0xc7, 0x3f, 0x78,
0x62, 0xac, 0x43, 0x0e, 0x64, 0xab, 0xe4, 0x99,
0xf4, 0x7c, 0x9b, 0x1f },
{ 0x5a, 0x8d, 0xef, 0x2f, 0x0c, 0x9e, 0x53, 0xf1,
0xf7, 0x5d, 0x78, 0x53, 0x65, 0x9e, 0x2a, 0x20,
0xee, 0xb2, 0xb2, 0x2a, 0xaf, 0xde, 0x64, 0x19,
0xa0, 0x58, 0xab, 0x4f, 0x6f, 0x74, 0x6b, 0xf4,
0x0f, 0xc0, 0xc3, 0xb7, 0x80, 0xf2, 0x44, 0x45,
0x2d, 0xa3, 0xeb, 0xf1, 0xc5, 0xd8, 0x2c, 0xde,
0xa2, 0x41, 0x89, 0x97, 0x20, 0x0e, 0xf8, 0x2e,
0x44, 0xae, 0x7e, 0x3f },
};
static uint8 tag[MAX_TESTS * 3][16] =
{
{ 0x58, 0xe2, 0xfc, 0xce, 0xfa, 0x7e, 0x30, 0x61,
0x36, 0x7f, 0x1d, 0x57, 0xa4, 0xe7, 0x45, 0x5a },
{ 0xab, 0x6e, 0x47, 0xd4, 0x2c, 0xec, 0x13, 0xbd,
0xf5, 0x3a, 0x67, 0xb2, 0x12, 0x57, 0xbd, 0xdf },
{ 0x4d, 0x5c, 0x2a, 0xf3, 0x27, 0xcd, 0x64, 0xa6,
0x2c, 0xf3, 0x5a, 0xbd, 0x2b, 0xa6, 0xfa, 0xb4 },
{ 0x5b, 0xc9, 0x4f, 0xbc, 0x32, 0x21, 0xa5, 0xdb,
0x94, 0xfa, 0xe9, 0x5a, 0xe7, 0x12, 0x1a, 0x47 },
{ 0x36, 0x12, 0xd2, 0xe7, 0x9e, 0x3b, 0x07, 0x85,
0x56, 0x1b, 0xe1, 0x4a, 0xac, 0xa2, 0xfc, 0xcb },
{ 0x61, 0x9c, 0xc5, 0xae, 0xff, 0xfe, 0x0b, 0xfa,
0x46, 0x2a, 0xf4, 0x3c, 0x16, 0x99, 0xd0, 0x50 },
{ 0xcd, 0x33, 0xb2, 0x8a, 0xc7, 0x73, 0xf7, 0x4b,
0xa0, 0x0e, 0xd1, 0xf3, 0x12, 0x57, 0x24, 0x35 },
{ 0x2f, 0xf5, 0x8d, 0x80, 0x03, 0x39, 0x27, 0xab,
0x8e, 0xf4, 0xd4, 0x58, 0x75, 0x14, 0xf0, 0xfb },
{ 0x99, 0x24, 0xa7, 0xc8, 0x58, 0x73, 0x36, 0xbf,
0xb1, 0x18, 0x02, 0x4d, 0xb8, 0x67, 0x4a, 0x14 },
{ 0x25, 0x19, 0x49, 0x8e, 0x80, 0xf1, 0x47, 0x8f,
0x37, 0xba, 0x55, 0xbd, 0x6d, 0x27, 0x61, 0x8c },
{ 0x65, 0xdc, 0xc5, 0x7f, 0xcf, 0x62, 0x3a, 0x24,
0x09, 0x4f, 0xcc, 0xa4, 0x0d, 0x35, 0x33, 0xf8 },
{ 0xdc, 0xf5, 0x66, 0xff, 0x29, 0x1c, 0x25, 0xbb,
0xb8, 0x56, 0x8f, 0xc3, 0xd3, 0x76, 0xa6, 0xd9 },
{ 0x53, 0x0f, 0x8a, 0xfb, 0xc7, 0x45, 0x36, 0xb9,
0xa9, 0x63, 0xb4, 0xf1, 0xc4, 0xcb, 0x73, 0x8b },
{ 0xd0, 0xd1, 0xc8, 0xa7, 0x99, 0x99, 0x6b, 0xf0,
0x26, 0x5b, 0x98, 0xb5, 0xd4, 0x8a, 0xb9, 0x19 },
{ 0xb0, 0x94, 0xda, 0xc5, 0xd9, 0x34, 0x71, 0xbd,
0xec, 0x1a, 0x50, 0x22, 0x70, 0xe3, 0xcc, 0x6c },
{ 0x76, 0xfc, 0x6e, 0xce, 0x0f, 0x4e, 0x17, 0x68,
0xcd, 0xdf, 0x88, 0x53, 0xbb, 0x2d, 0x55, 0x1b },
{ 0x3a, 0x33, 0x7d, 0xbf, 0x46, 0xa7, 0x92, 0xc4,
0x5e, 0x45, 0x49, 0x13, 0xfe, 0x2e, 0xa8, 0xf2 },
{ 0xa4, 0x4a, 0x82, 0x66, 0xee, 0x1c, 0x8e, 0xb0,
0xc8, 0xb5, 0xd4, 0xcf, 0x5a, 0xe9, 0xf1, 0x9a },
};
int gcm_self_test()
{
uint8 buf[64];
uint8 tag_buf[16];
int i, j;
AesGcm128TempContext ctx;
AesGcm128StaticContext sctx;
{
AesContext aes;
uint8 key[16] = {43,126,21,22,40,174,210,166,171,247,21,136,9,207,79,60};
uint8 in[16] = {107,193,190,226,46,64,159,150,233,61,126,17,115,147,23,42};
uint8 out[16] = {58,215,123,180,13,122,54,96,168,158,202,243,36,102,239,151}, t[16];
aesni_set_encrypt_key(key, 128, &aes);
aesni_encrypt(in, t, &aes);
if (memcmp(t, out,16)) { printf("AES test fail!\n"); return 1; }
aesni_set_decrypt_key(key, 128, &aes);
aesni_decrypt(out, t, &aes);
if (memcmp(t, in,16)) { printf("AES test fail!\n"); return 1; }
}
uint8 correct[] = { 62,85,184,249,224,220,4,77,201,216,202,172,121,7,25,200, };
if (0) {
uint8 buf[512 + 16];
for (size_t i = 0; i < 512; i++)
buf[i] = (uint8)(i >> 4);// 0x11;
uint8 buf2[512 + 16];
for (size_t i = 0; i < 512; i++)
buf2[i] = buf[i];
size_t pp = 0x60;
CRYPTO_gcm128_init(&sctx, key[0], 128);
sctx.use_aesni_gcm_crypt = 1;
aesgcm_decrypt_get_mac(buf, buf, pp, NULL, 0, 1, &sctx, buf + pp);
sctx.use_aesni_gcm_crypt = 0;
aesgcm_decrypt_get_mac(buf2, buf2, pp, NULL, 0, 1, &sctx, buf2 + pp);
//aesgcm_encrypt(buf, buf, 0x120 + 32, NULL, 0, 1, &sctx);
for (size_t i = 0; i < 16; i++)
printf("%d,", buf[pp + i]);
printf("\n");
for (size_t i = 0; i < 16; i++)
printf("%d,", buf2[pp + i]);
printf("\n");
if (memcmp(buf2 + pp, buf + pp, 16) == 0)
printf("CORRECT!!\n");
else
printf("******** FAIL ************\n");
// for(size_t i = 0; i < 16; i++)
// printf("%d,", buf[pp +i]);
printf("\n");
}
return 0;
for( j = 0; j < 3; j++ ) {
int key_len = 128 + 64 * j;
for( i = 0; i < MAX_TESTS; i++ ) {
CRYPTO_gcm128_init(&sctx, key[key_index[i]], key_len);
CRYPTO_gcm128_setiv(&ctx, &sctx, iv[iv_index[i]], iv_len[i]);
CRYPTO_gcm128_aad(&ctx, additional[add_index[i]], add_len[i]);
CRYPTO_gcm128_encrypt_ctr32(&ctx, pt[pt_index[i]], buf, pt_len[i]);
CRYPTO_gcm128_finish(&ctx, tag_buf, 16);
if(memcmp( buf, ct[j * 6 + i], pt_len[i] ) != 0 ||
memcmp( tag_buf, tag[j * 6 + i], 16 ) != 0 ) {
printf( "AES-GCM-%3d #%d (%s): failed\n", key_len, i, "enc" );
return( 1 );
}
CRYPTO_gcm128_init(&sctx, key[key_index[i]], key_len);
CRYPTO_gcm128_setiv(&ctx, &sctx, iv[iv_index[i]], iv_len[i]);
CRYPTO_gcm128_aad(&ctx, additional[add_index[i]], add_len[i]);
CRYPTO_gcm128_decrypt_ctr32(&ctx, ct[j * 6 + i], buf, pt_len[i]);
CRYPTO_gcm128_finish(&ctx, tag_buf, 16);
if(memcmp( buf, pt[pt_index[i]], pt_len[i] ) != 0 ||
memcmp( tag_buf, tag[j * 6 + i], 16 ) != 0 ) {
printf( "AES-GCM-%3d #%d (%s): failed\n", key_len, i, "dec" );
return( 1 );
}
}
}
return( 0 );
}
//int main() {
// gcm_self_test();
//}
#endif
#endif // #if WITH_AESGCM

File diff suppressed because it is too large Load diff

2544
crypto/aesgcm/aesni-x86.pl Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,831 @@
.text
.type _aesni_ctr32_ghash_6x,@function
.align 32
_aesni_ctr32_ghash_6x:
.cfi_startproc
vmovdqu 32(%r11),%xmm2
subq $6,%rdx
vpxor %xmm4,%xmm4,%xmm4
vmovdqu 0-128(%rcx),%xmm15
vpaddb %xmm2,%xmm1,%xmm10
vpaddb %xmm2,%xmm10,%xmm11
vpaddb %xmm2,%xmm11,%xmm12
vpaddb %xmm2,%xmm12,%xmm13
vpaddb %xmm2,%xmm13,%xmm14
vpxor %xmm15,%xmm1,%xmm9
vmovdqu %xmm4,16+8(%rsp)
jmp .Loop6x
.align 32
.Loop6x:
addl $100663296,%ebx
jc .Lhandle_ctr32
vmovdqu 0-32(%r9),%xmm3
vpaddb %xmm2,%xmm14,%xmm1
vpxor %xmm15,%xmm10,%xmm10
vpxor %xmm15,%xmm11,%xmm11
.Lresume_ctr32:
vmovdqu %xmm1,(%r8)
vpclmulqdq $0x10,%xmm3,%xmm7,%xmm5
vpxor %xmm15,%xmm12,%xmm12
vmovups 16-128(%rcx),%xmm2
vpclmulqdq $0x01,%xmm3,%xmm7,%xmm6
xorq %r12,%r12
cmpq %r14,%r15
vaesenc %xmm2,%xmm9,%xmm9
vmovdqu 48+8(%rsp),%xmm0
vpxor %xmm15,%xmm13,%xmm13
vpclmulqdq $0x00,%xmm3,%xmm7,%xmm1
vaesenc %xmm2,%xmm10,%xmm10
vpxor %xmm15,%xmm14,%xmm14
setnc %r12b
vpclmulqdq $0x11,%xmm3,%xmm7,%xmm7
vaesenc %xmm2,%xmm11,%xmm11
vmovdqu 16-32(%r9),%xmm3
negq %r12
vaesenc %xmm2,%xmm12,%xmm12
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm3,%xmm0,%xmm5
vpxor %xmm4,%xmm8,%xmm8
vaesenc %xmm2,%xmm13,%xmm13
vpxor %xmm5,%xmm1,%xmm4
andq $0x60,%r12
vmovups 32-128(%rcx),%xmm15
vpclmulqdq $0x10,%xmm3,%xmm0,%xmm1
vaesenc %xmm2,%xmm14,%xmm14
vpclmulqdq $0x01,%xmm3,%xmm0,%xmm2
leaq (%r14,%r12,1),%r14
vaesenc %xmm15,%xmm9,%xmm9
vpxor 16+8(%rsp),%xmm8,%xmm8
vpclmulqdq $0x11,%xmm3,%xmm0,%xmm3
vmovdqu 64+8(%rsp),%xmm0
vaesenc %xmm15,%xmm10,%xmm10
movbeq 88(%r14),%r13
vaesenc %xmm15,%xmm11,%xmm11
movbeq 80(%r14),%r12
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,32+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,40+8(%rsp)
vmovdqu 48-32(%r9),%xmm5
vaesenc %xmm15,%xmm14,%xmm14
vmovups 48-128(%rcx),%xmm15
vpxor %xmm1,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm5,%xmm0,%xmm1
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm2,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm5,%xmm0,%xmm2
vaesenc %xmm15,%xmm10,%xmm10
vpxor %xmm3,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm5,%xmm0,%xmm3
vaesenc %xmm15,%xmm11,%xmm11
vpclmulqdq $0x11,%xmm5,%xmm0,%xmm5
vmovdqu 80+8(%rsp),%xmm0
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vpxor %xmm1,%xmm4,%xmm4
vmovdqu 64-32(%r9),%xmm1
vaesenc %xmm15,%xmm14,%xmm14
vmovups 64-128(%rcx),%xmm15
vpxor %xmm2,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm1,%xmm0,%xmm2
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm3,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm1,%xmm0,%xmm3
vaesenc %xmm15,%xmm10,%xmm10
movbeq 72(%r14),%r13
vpxor %xmm5,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm1,%xmm0,%xmm5
vaesenc %xmm15,%xmm11,%xmm11
movbeq 64(%r14),%r12
vpclmulqdq $0x11,%xmm1,%xmm0,%xmm1
vmovdqu 96+8(%rsp),%xmm0
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,48+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,56+8(%rsp)
vpxor %xmm2,%xmm4,%xmm4
vmovdqu 96-32(%r9),%xmm2
vaesenc %xmm15,%xmm14,%xmm14
vmovups 80-128(%rcx),%xmm15
vpxor %xmm3,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm2,%xmm0,%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm2,%xmm0,%xmm5
vaesenc %xmm15,%xmm10,%xmm10
movbeq 56(%r14),%r13
vpxor %xmm1,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm2,%xmm0,%xmm1
vpxor 112+8(%rsp),%xmm8,%xmm8
vaesenc %xmm15,%xmm11,%xmm11
movbeq 48(%r14),%r12
vpclmulqdq $0x11,%xmm2,%xmm0,%xmm2
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,64+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,72+8(%rsp)
vpxor %xmm3,%xmm4,%xmm4
vmovdqu 112-32(%r9),%xmm3
vaesenc %xmm15,%xmm14,%xmm14
vmovups 96-128(%rcx),%xmm15
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm5
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm1,%xmm6,%xmm6
vpclmulqdq $0x01,%xmm3,%xmm8,%xmm1
vaesenc %xmm15,%xmm10,%xmm10
movbeq 40(%r14),%r13
vpxor %xmm2,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm3,%xmm8,%xmm2
vaesenc %xmm15,%xmm11,%xmm11
movbeq 32(%r14),%r12
vpclmulqdq $0x11,%xmm3,%xmm8,%xmm8
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,80+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,88+8(%rsp)
vpxor %xmm5,%xmm6,%xmm6
vaesenc %xmm15,%xmm14,%xmm14
vpxor %xmm1,%xmm6,%xmm6
vmovups 112-128(%rcx),%xmm15
vpslldq $8,%xmm6,%xmm5
vpxor %xmm2,%xmm4,%xmm4
vmovdqu 16(%r11),%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm8,%xmm7,%xmm7
vaesenc %xmm15,%xmm10,%xmm10
vpxor %xmm5,%xmm4,%xmm4
movbeq 24(%r14),%r13
vaesenc %xmm15,%xmm11,%xmm11
movbeq 16(%r14),%r12
vpalignr $8,%xmm4,%xmm4,%xmm0
vpclmulqdq $0x10,%xmm3,%xmm4,%xmm4
movq %r13,96+8(%rsp)
vaesenc %xmm15,%xmm12,%xmm12
movq %r12,104+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
vmovups 128-128(%rcx),%xmm1
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vmovups 144-128(%rcx),%xmm15
vaesenc %xmm1,%xmm10,%xmm10
vpsrldq $8,%xmm6,%xmm6
vaesenc %xmm1,%xmm11,%xmm11
vpxor %xmm6,%xmm7,%xmm7
vaesenc %xmm1,%xmm12,%xmm12
vpxor %xmm0,%xmm4,%xmm4
movbeq 8(%r14),%r13
vaesenc %xmm1,%xmm13,%xmm13
movbeq 0(%r14),%r12
vaesenc %xmm1,%xmm14,%xmm14
vmovups 160-128(%rcx),%xmm1
cmpl $11,%ebp
jb .Lenc_tail
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vaesenc %xmm1,%xmm10,%xmm10
vaesenc %xmm1,%xmm11,%xmm11
vaesenc %xmm1,%xmm12,%xmm12
vaesenc %xmm1,%xmm13,%xmm13
vmovups 176-128(%rcx),%xmm15
vaesenc %xmm1,%xmm14,%xmm14
vmovups 192-128(%rcx),%xmm1
je .Lenc_tail
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vaesenc %xmm1,%xmm10,%xmm10
vaesenc %xmm1,%xmm11,%xmm11
vaesenc %xmm1,%xmm12,%xmm12
vaesenc %xmm1,%xmm13,%xmm13
vmovups 208-128(%rcx),%xmm15
vaesenc %xmm1,%xmm14,%xmm14
vmovups 224-128(%rcx),%xmm1
jmp .Lenc_tail
.align 32
.Lhandle_ctr32:
vmovdqu (%r11),%xmm0
vpshufb %xmm0,%xmm1,%xmm6
vmovdqu 48(%r11),%xmm5
vpaddd 64(%r11),%xmm6,%xmm10
vpaddd %xmm5,%xmm6,%xmm11
vmovdqu 0-32(%r9),%xmm3
vpaddd %xmm5,%xmm10,%xmm12
vpshufb %xmm0,%xmm10,%xmm10
vpaddd %xmm5,%xmm11,%xmm13
vpshufb %xmm0,%xmm11,%xmm11
vpxor %xmm15,%xmm10,%xmm10
vpaddd %xmm5,%xmm12,%xmm14
vpshufb %xmm0,%xmm12,%xmm12
vpxor %xmm15,%xmm11,%xmm11
vpaddd %xmm5,%xmm13,%xmm1
vpshufb %xmm0,%xmm13,%xmm13
vpshufb %xmm0,%xmm14,%xmm14
vpshufb %xmm0,%xmm1,%xmm1
jmp .Lresume_ctr32
.align 32
.Lenc_tail:
vaesenc %xmm15,%xmm9,%xmm9
vmovdqu %xmm7,16+8(%rsp)
vpalignr $8,%xmm4,%xmm4,%xmm8
vaesenc %xmm15,%xmm10,%xmm10
vpclmulqdq $0x10,%xmm3,%xmm4,%xmm4
vpxor 0(%rdi),%xmm1,%xmm2
vaesenc %xmm15,%xmm11,%xmm11
vpxor 16(%rdi),%xmm1,%xmm0
vaesenc %xmm15,%xmm12,%xmm12
vpxor 32(%rdi),%xmm1,%xmm5
vaesenc %xmm15,%xmm13,%xmm13
vpxor 48(%rdi),%xmm1,%xmm6
vaesenc %xmm15,%xmm14,%xmm14
vpxor 64(%rdi),%xmm1,%xmm7
vpxor 80(%rdi),%xmm1,%xmm3
vmovdqu (%r8),%xmm1
vaesenclast %xmm2,%xmm9,%xmm9
vmovdqu 32(%r11),%xmm2
vaesenclast %xmm0,%xmm10,%xmm10
vpaddb %xmm2,%xmm1,%xmm0
movq %r13,112+8(%rsp)
leaq 96(%rdi),%rdi
vaesenclast %xmm5,%xmm11,%xmm11
vpaddb %xmm2,%xmm0,%xmm5
movq %r12,120+8(%rsp)
leaq 96(%rsi),%rsi
vmovdqu 0-128(%rcx),%xmm15
vaesenclast %xmm6,%xmm12,%xmm12
vpaddb %xmm2,%xmm5,%xmm6
vaesenclast %xmm7,%xmm13,%xmm13
vpaddb %xmm2,%xmm6,%xmm7
vaesenclast %xmm3,%xmm14,%xmm14
vpaddb %xmm2,%xmm7,%xmm3
addq $0x60,%r10
subq $0x6,%rdx
jc .L6x_done
vmovups %xmm9,-96(%rsi)
vpxor %xmm15,%xmm1,%xmm9
vmovups %xmm10,-80(%rsi)
vmovdqa %xmm0,%xmm10
vmovups %xmm11,-64(%rsi)
vmovdqa %xmm5,%xmm11
vmovups %xmm12,-48(%rsi)
vmovdqa %xmm6,%xmm12
vmovups %xmm13,-32(%rsi)
vmovdqa %xmm7,%xmm13
vmovups %xmm14,-16(%rsi)
vmovdqa %xmm3,%xmm14
vmovdqu 32+8(%rsp),%xmm7
jmp .Loop6x
.L6x_done:
vpxor 16+8(%rsp),%xmm8,%xmm8
vpxor %xmm4,%xmm8,%xmm8
ret
.cfi_endproc
.size _aesni_ctr32_ghash_6x,.-_aesni_ctr32_ghash_6x
.globl aesni_gcm_decrypt
.type aesni_gcm_decrypt,@function
.align 32
aesni_gcm_decrypt:
.cfi_startproc
xorq %r10,%r10
cmpq $0x60,%rdx
jb .Lgcm_dec_abort
leaq (%rsp),%rax
.cfi_def_cfa_register %rax
pushq %rbx
.cfi_offset %rbx,-16
pushq %rbp
.cfi_offset %rbp,-24
pushq %r12
.cfi_offset %r12,-32
pushq %r13
.cfi_offset %r13,-40
pushq %r14
.cfi_offset %r14,-48
pushq %r15
.cfi_offset %r15,-56
vzeroupper
vmovdqu (%r8),%xmm1
addq $-128,%rsp
movl 12(%r8),%ebx
leaq .Lbswap_mask(%rip),%r11
leaq -128(%rcx),%r14
movq $0xf80,%r15
vmovdqu 16(%r8),%xmm8
andq $-128,%rsp
vmovdqu (%r11),%xmm0
leaq 128(%rcx),%rcx
leaq 16+32(%r9),%r9
movl 240-128(%rcx),%ebp
vpshufb %xmm0,%xmm8,%xmm8
andq %r15,%r14
andq %rsp,%r15
subq %r14,%r15
jc .Ldec_no_key_aliasing
cmpq $768,%r15
jnc .Ldec_no_key_aliasing
subq %r15,%rsp
.Ldec_no_key_aliasing:
vmovdqu 80(%rdi),%xmm7
leaq (%rdi),%r14
vmovdqu 64(%rdi),%xmm4
leaq -192(%rdi,%rdx,1),%r15
vmovdqu 48(%rdi),%xmm5
shrq $4,%rdx
xorq %r10,%r10
vmovdqu 32(%rdi),%xmm6
vpshufb %xmm0,%xmm7,%xmm7
vmovdqu 16(%rdi),%xmm2
vpshufb %xmm0,%xmm4,%xmm4
vmovdqu (%rdi),%xmm3
vpshufb %xmm0,%xmm5,%xmm5
vmovdqu %xmm4,48(%rsp)
vpshufb %xmm0,%xmm6,%xmm6
vmovdqu %xmm5,64(%rsp)
vpshufb %xmm0,%xmm2,%xmm2
vmovdqu %xmm6,80(%rsp)
vpshufb %xmm0,%xmm3,%xmm3
vmovdqu %xmm2,96(%rsp)
vmovdqu %xmm3,112(%rsp)
call _aesni_ctr32_ghash_6x
vmovups %xmm9,-96(%rsi)
vmovups %xmm10,-80(%rsi)
vmovups %xmm11,-64(%rsi)
vmovups %xmm12,-48(%rsi)
vmovups %xmm13,-32(%rsi)
vmovups %xmm14,-16(%rsi)
vpshufb (%r11),%xmm8,%xmm8
vmovdqu %xmm8,16(%r8)
vzeroupper
movq -48(%rax),%r15
.cfi_restore %r15
movq -40(%rax),%r14
.cfi_restore %r14
movq -32(%rax),%r13
.cfi_restore %r13
movq -24(%rax),%r12
.cfi_restore %r12
movq -16(%rax),%rbp
.cfi_restore %rbp
movq -8(%rax),%rbx
.cfi_restore %rbx
leaq (%rax),%rsp
.cfi_def_cfa_register %rsp
.Lgcm_dec_abort:
movq %r10,%rax
ret
.cfi_endproc
.size aesni_gcm_decrypt,.-aesni_gcm_decrypt
.type _aesni_ctr32_6x,@function
.align 32
_aesni_ctr32_6x:
.cfi_startproc
vmovdqu 0-128(%rcx),%xmm4
vmovdqu 32(%r11),%xmm2
leaq -1(%rbp),%r13
vmovups 16-128(%rcx),%xmm15
leaq 32-128(%rcx),%r12
vpxor %xmm4,%xmm1,%xmm9
addl $100663296,%ebx
jc .Lhandle_ctr32_2
vpaddb %xmm2,%xmm1,%xmm10
vpaddb %xmm2,%xmm10,%xmm11
vpxor %xmm4,%xmm10,%xmm10
vpaddb %xmm2,%xmm11,%xmm12
vpxor %xmm4,%xmm11,%xmm11
vpaddb %xmm2,%xmm12,%xmm13
vpxor %xmm4,%xmm12,%xmm12
vpaddb %xmm2,%xmm13,%xmm14
vpxor %xmm4,%xmm13,%xmm13
vpaddb %xmm2,%xmm14,%xmm1
vpxor %xmm4,%xmm14,%xmm14
jmp .Loop_ctr32
.align 16
.Loop_ctr32:
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vmovups (%r12),%xmm15
leaq 16(%r12),%r12
decl %r13d
jnz .Loop_ctr32
vmovdqu (%r12),%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor 0(%rdi),%xmm3,%xmm4
vaesenc %xmm15,%xmm10,%xmm10
vpxor 16(%rdi),%xmm3,%xmm5
vaesenc %xmm15,%xmm11,%xmm11
vpxor 32(%rdi),%xmm3,%xmm6
vaesenc %xmm15,%xmm12,%xmm12
vpxor 48(%rdi),%xmm3,%xmm8
vaesenc %xmm15,%xmm13,%xmm13
vpxor 64(%rdi),%xmm3,%xmm2
vaesenc %xmm15,%xmm14,%xmm14
vpxor 80(%rdi),%xmm3,%xmm3
leaq 96(%rdi),%rdi
vaesenclast %xmm4,%xmm9,%xmm9
vaesenclast %xmm5,%xmm10,%xmm10
vaesenclast %xmm6,%xmm11,%xmm11
vaesenclast %xmm8,%xmm12,%xmm12
vaesenclast %xmm2,%xmm13,%xmm13
vaesenclast %xmm3,%xmm14,%xmm14
vmovups %xmm9,0(%rsi)
vmovups %xmm10,16(%rsi)
vmovups %xmm11,32(%rsi)
vmovups %xmm12,48(%rsi)
vmovups %xmm13,64(%rsi)
vmovups %xmm14,80(%rsi)
leaq 96(%rsi),%rsi
ret
.align 32
.Lhandle_ctr32_2:
vpshufb %xmm0,%xmm1,%xmm6
vmovdqu 48(%r11),%xmm5
vpaddd 64(%r11),%xmm6,%xmm10
vpaddd %xmm5,%xmm6,%xmm11
vpaddd %xmm5,%xmm10,%xmm12
vpshufb %xmm0,%xmm10,%xmm10
vpaddd %xmm5,%xmm11,%xmm13
vpshufb %xmm0,%xmm11,%xmm11
vpxor %xmm4,%xmm10,%xmm10
vpaddd %xmm5,%xmm12,%xmm14
vpshufb %xmm0,%xmm12,%xmm12
vpxor %xmm4,%xmm11,%xmm11
vpaddd %xmm5,%xmm13,%xmm1
vpshufb %xmm0,%xmm13,%xmm13
vpxor %xmm4,%xmm12,%xmm12
vpshufb %xmm0,%xmm14,%xmm14
vpxor %xmm4,%xmm13,%xmm13
vpshufb %xmm0,%xmm1,%xmm1
vpxor %xmm4,%xmm14,%xmm14
jmp .Loop_ctr32
.cfi_endproc
.size _aesni_ctr32_6x,.-_aesni_ctr32_6x
.globl aesni_gcm_encrypt
.type aesni_gcm_encrypt,@function
.align 32
aesni_gcm_encrypt:
.cfi_startproc
xorq %r10,%r10
cmpq $288,%rdx
jb .Lgcm_enc_abort
leaq (%rsp),%rax
.cfi_def_cfa_register %rax
pushq %rbx
.cfi_offset %rbx,-16
pushq %rbp
.cfi_offset %rbp,-24
pushq %r12
.cfi_offset %r12,-32
pushq %r13
.cfi_offset %r13,-40
pushq %r14
.cfi_offset %r14,-48
pushq %r15
.cfi_offset %r15,-56
vzeroupper
vmovdqu (%r8),%xmm1
addq $-128,%rsp
movl 12(%r8),%ebx
leaq .Lbswap_mask(%rip),%r11
leaq -128(%rcx),%r14
movq $0xf80,%r15
leaq 128(%rcx),%rcx
vmovdqu (%r11),%xmm0
andq $-128,%rsp
movl 240-128(%rcx),%ebp
andq %r15,%r14
andq %rsp,%r15
subq %r14,%r15
jc .Lenc_no_key_aliasing
cmpq $768,%r15
jnc .Lenc_no_key_aliasing
subq %r15,%rsp
.Lenc_no_key_aliasing:
leaq (%rsi),%r14
leaq -192(%rsi,%rdx,1),%r15
shrq $4,%rdx
call _aesni_ctr32_6x
vpshufb %xmm0,%xmm9,%xmm8
vpshufb %xmm0,%xmm10,%xmm2
vmovdqu %xmm8,112(%rsp)
vpshufb %xmm0,%xmm11,%xmm4
vmovdqu %xmm2,96(%rsp)
vpshufb %xmm0,%xmm12,%xmm5
vmovdqu %xmm4,80(%rsp)
vpshufb %xmm0,%xmm13,%xmm6
vmovdqu %xmm5,64(%rsp)
vpshufb %xmm0,%xmm14,%xmm7
vmovdqu %xmm6,48(%rsp)
call _aesni_ctr32_6x
vmovdqu 16(%r8),%xmm8
leaq 16+32(%r9),%r9
subq $12,%rdx
movq $192,%r10
vpshufb %xmm0,%xmm8,%xmm8
call _aesni_ctr32_ghash_6x
vmovdqu 32(%rsp),%xmm7
vmovdqu (%r11),%xmm0
vmovdqu 0-32(%r9),%xmm3
vpunpckhqdq %xmm7,%xmm7,%xmm1
vmovdqu 32-32(%r9),%xmm15
vmovups %xmm9,-96(%rsi)
vpshufb %xmm0,%xmm9,%xmm9
vpxor %xmm7,%xmm1,%xmm1
vmovups %xmm10,-80(%rsi)
vpshufb %xmm0,%xmm10,%xmm10
vmovups %xmm11,-64(%rsi)
vpshufb %xmm0,%xmm11,%xmm11
vmovups %xmm12,-48(%rsi)
vpshufb %xmm0,%xmm12,%xmm12
vmovups %xmm13,-32(%rsi)
vpshufb %xmm0,%xmm13,%xmm13
vmovups %xmm14,-16(%rsi)
vpshufb %xmm0,%xmm14,%xmm14
vmovdqu %xmm9,16(%rsp)
vmovdqu 48(%rsp),%xmm6
vmovdqu 16-32(%r9),%xmm0
vpunpckhqdq %xmm6,%xmm6,%xmm2
vpclmulqdq $0x00,%xmm3,%xmm7,%xmm5
vpxor %xmm6,%xmm2,%xmm2
vpclmulqdq $0x11,%xmm3,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm15,%xmm1,%xmm1
vmovdqu 64(%rsp),%xmm9
vpclmulqdq $0x00,%xmm0,%xmm6,%xmm4
vmovdqu 48-32(%r9),%xmm3
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm9,%xmm9,%xmm5
vpclmulqdq $0x11,%xmm0,%xmm6,%xmm6
vpxor %xmm9,%xmm5,%xmm5
vpxor %xmm7,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm15,%xmm2,%xmm2
vmovdqu 80-32(%r9),%xmm15
vpxor %xmm1,%xmm2,%xmm2
vmovdqu 80(%rsp),%xmm1
vpclmulqdq $0x00,%xmm3,%xmm9,%xmm7
vmovdqu 64-32(%r9),%xmm0
vpxor %xmm4,%xmm7,%xmm7
vpunpckhqdq %xmm1,%xmm1,%xmm4
vpclmulqdq $0x11,%xmm3,%xmm9,%xmm9
vpxor %xmm1,%xmm4,%xmm4
vpxor %xmm6,%xmm9,%xmm9
vpclmulqdq $0x00,%xmm15,%xmm5,%xmm5
vpxor %xmm2,%xmm5,%xmm5
vmovdqu 96(%rsp),%xmm2
vpclmulqdq $0x00,%xmm0,%xmm1,%xmm6
vmovdqu 96-32(%r9),%xmm3
vpxor %xmm7,%xmm6,%xmm6
vpunpckhqdq %xmm2,%xmm2,%xmm7
vpclmulqdq $0x11,%xmm0,%xmm1,%xmm1
vpxor %xmm2,%xmm7,%xmm7
vpxor %xmm9,%xmm1,%xmm1
vpclmulqdq $0x10,%xmm15,%xmm4,%xmm4
vmovdqu 128-32(%r9),%xmm15
vpxor %xmm5,%xmm4,%xmm4
vpxor 112(%rsp),%xmm8,%xmm8
vpclmulqdq $0x00,%xmm3,%xmm2,%xmm5
vmovdqu 112-32(%r9),%xmm0
vpunpckhqdq %xmm8,%xmm8,%xmm9
vpxor %xmm6,%xmm5,%xmm5
vpclmulqdq $0x11,%xmm3,%xmm2,%xmm2
vpxor %xmm8,%xmm9,%xmm9
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm15,%xmm7,%xmm7
vpxor %xmm4,%xmm7,%xmm4
vpclmulqdq $0x00,%xmm0,%xmm8,%xmm6
vmovdqu 0-32(%r9),%xmm3
vpunpckhqdq %xmm14,%xmm14,%xmm1
vpclmulqdq $0x11,%xmm0,%xmm8,%xmm8
vpxor %xmm14,%xmm1,%xmm1
vpxor %xmm5,%xmm6,%xmm5
vpclmulqdq $0x10,%xmm15,%xmm9,%xmm9
vmovdqu 32-32(%r9),%xmm15
vpxor %xmm2,%xmm8,%xmm7
vpxor %xmm4,%xmm9,%xmm6
vmovdqu 16-32(%r9),%xmm0
vpxor %xmm5,%xmm7,%xmm9
vpclmulqdq $0x00,%xmm3,%xmm14,%xmm4
vpxor %xmm9,%xmm6,%xmm6
vpunpckhqdq %xmm13,%xmm13,%xmm2
vpclmulqdq $0x11,%xmm3,%xmm14,%xmm14
vpxor %xmm13,%xmm2,%xmm2
vpslldq $8,%xmm6,%xmm9
vpclmulqdq $0x00,%xmm15,%xmm1,%xmm1
vpxor %xmm9,%xmm5,%xmm8
vpsrldq $8,%xmm6,%xmm6
vpxor %xmm6,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm0,%xmm13,%xmm5
vmovdqu 48-32(%r9),%xmm3
vpxor %xmm4,%xmm5,%xmm5
vpunpckhqdq %xmm12,%xmm12,%xmm9
vpclmulqdq $0x11,%xmm0,%xmm13,%xmm13
vpxor %xmm12,%xmm9,%xmm9
vpxor %xmm14,%xmm13,%xmm13
vpalignr $8,%xmm8,%xmm8,%xmm14
vpclmulqdq $0x10,%xmm15,%xmm2,%xmm2
vmovdqu 80-32(%r9),%xmm15
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm3,%xmm12,%xmm4
vmovdqu 64-32(%r9),%xmm0
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm11,%xmm11,%xmm1
vpclmulqdq $0x11,%xmm3,%xmm12,%xmm12
vpxor %xmm11,%xmm1,%xmm1
vpxor %xmm13,%xmm12,%xmm12
vxorps 16(%rsp),%xmm7,%xmm7
vpclmulqdq $0x00,%xmm15,%xmm9,%xmm9
vpxor %xmm2,%xmm9,%xmm9
vpclmulqdq $0x10,16(%r11),%xmm8,%xmm8
vxorps %xmm14,%xmm8,%xmm8
vpclmulqdq $0x00,%xmm0,%xmm11,%xmm5
vmovdqu 96-32(%r9),%xmm3
vpxor %xmm4,%xmm5,%xmm5
vpunpckhqdq %xmm10,%xmm10,%xmm2
vpclmulqdq $0x11,%xmm0,%xmm11,%xmm11
vpxor %xmm10,%xmm2,%xmm2
vpalignr $8,%xmm8,%xmm8,%xmm14
vpxor %xmm12,%xmm11,%xmm11
vpclmulqdq $0x10,%xmm15,%xmm1,%xmm1
vmovdqu 128-32(%r9),%xmm15
vpxor %xmm9,%xmm1,%xmm1
vxorps %xmm7,%xmm14,%xmm14
vpclmulqdq $0x10,16(%r11),%xmm8,%xmm8
vxorps %xmm14,%xmm8,%xmm8
vpclmulqdq $0x00,%xmm3,%xmm10,%xmm4
vmovdqu 112-32(%r9),%xmm0
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm8,%xmm8,%xmm9
vpclmulqdq $0x11,%xmm3,%xmm10,%xmm10
vpxor %xmm8,%xmm9,%xmm9
vpxor %xmm11,%xmm10,%xmm10
vpclmulqdq $0x00,%xmm15,%xmm2,%xmm2
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm0,%xmm8,%xmm5
vpclmulqdq $0x11,%xmm0,%xmm8,%xmm7
vpxor %xmm4,%xmm5,%xmm5
vpclmulqdq $0x10,%xmm15,%xmm9,%xmm6
vpxor %xmm10,%xmm7,%xmm7
vpxor %xmm2,%xmm6,%xmm6
vpxor %xmm5,%xmm7,%xmm4
vpxor %xmm4,%xmm6,%xmm6
vpslldq $8,%xmm6,%xmm1
vmovdqu 16(%r11),%xmm3
vpsrldq $8,%xmm6,%xmm6
vpxor %xmm1,%xmm5,%xmm8
vpxor %xmm6,%xmm7,%xmm7
vpalignr $8,%xmm8,%xmm8,%xmm2
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm8
vpxor %xmm2,%xmm8,%xmm8
vpalignr $8,%xmm8,%xmm8,%xmm2
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm8
vpxor %xmm7,%xmm2,%xmm2
vpxor %xmm2,%xmm8,%xmm8
vpshufb (%r11),%xmm8,%xmm8
vmovdqu %xmm8,16(%r8)
vzeroupper
movq -48(%rax),%r15
.cfi_restore %r15
movq -40(%rax),%r14
.cfi_restore %r14
movq -32(%rax),%r13
.cfi_restore %r13
movq -24(%rax),%r12
.cfi_restore %r12
movq -16(%rax),%rbp
.cfi_restore %rbp
movq -8(%rax),%rbx
.cfi_restore %rbx
leaq (%rax),%rsp
.cfi_def_cfa_register %rsp
.Lgcm_enc_abort:
movq %r10,%rax
ret
.cfi_endproc
.size aesni_gcm_encrypt,.-aesni_gcm_encrypt
.align 64
.Lbswap_mask:
.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
.Lpoly:
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
.Lone_msb:
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
.Ltwo_lsb:
.byte 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.Lone_lsb:
.byte 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108,101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
.align 64

View file

@ -0,0 +1,831 @@
.text
.p2align 5
_aesni_ctr32_ghash_6x:
vmovdqu 32(%r11),%xmm2
subq $6,%rdx
vpxor %xmm4,%xmm4,%xmm4
vmovdqu 0-128(%rcx),%xmm15
vpaddb %xmm2,%xmm1,%xmm10
vpaddb %xmm2,%xmm10,%xmm11
vpaddb %xmm2,%xmm11,%xmm12
vpaddb %xmm2,%xmm12,%xmm13
vpaddb %xmm2,%xmm13,%xmm14
vpxor %xmm15,%xmm1,%xmm9
vmovdqu %xmm4,16+8(%rsp)
jmp L$oop6x
.p2align 5
L$oop6x:
addl $100663296,%ebx
jc L$handle_ctr32
vmovdqu 0-32(%r9),%xmm3
vpaddb %xmm2,%xmm14,%xmm1
vpxor %xmm15,%xmm10,%xmm10
vpxor %xmm15,%xmm11,%xmm11
L$resume_ctr32:
vmovdqu %xmm1,(%r8)
vpclmulqdq $0x10,%xmm3,%xmm7,%xmm5
vpxor %xmm15,%xmm12,%xmm12
vmovups 16-128(%rcx),%xmm2
vpclmulqdq $0x01,%xmm3,%xmm7,%xmm6
xorq %r12,%r12
cmpq %r14,%r15
vaesenc %xmm2,%xmm9,%xmm9
vmovdqu 48+8(%rsp),%xmm0
vpxor %xmm15,%xmm13,%xmm13
vpclmulqdq $0x00,%xmm3,%xmm7,%xmm1
vaesenc %xmm2,%xmm10,%xmm10
vpxor %xmm15,%xmm14,%xmm14
setnc %r12b
vpclmulqdq $0x11,%xmm3,%xmm7,%xmm7
vaesenc %xmm2,%xmm11,%xmm11
vmovdqu 16-32(%r9),%xmm3
negq %r12
vaesenc %xmm2,%xmm12,%xmm12
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm3,%xmm0,%xmm5
vpxor %xmm4,%xmm8,%xmm8
vaesenc %xmm2,%xmm13,%xmm13
vpxor %xmm5,%xmm1,%xmm4
andq $0x60,%r12
vmovups 32-128(%rcx),%xmm15
vpclmulqdq $0x10,%xmm3,%xmm0,%xmm1
vaesenc %xmm2,%xmm14,%xmm14
vpclmulqdq $0x01,%xmm3,%xmm0,%xmm2
leaq (%r14,%r12,1),%r14
vaesenc %xmm15,%xmm9,%xmm9
vpxor 16+8(%rsp),%xmm8,%xmm8
vpclmulqdq $0x11,%xmm3,%xmm0,%xmm3
vmovdqu 64+8(%rsp),%xmm0
vaesenc %xmm15,%xmm10,%xmm10
movbeq 88(%r14),%r13
vaesenc %xmm15,%xmm11,%xmm11
movbeq 80(%r14),%r12
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,32+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,40+8(%rsp)
vmovdqu 48-32(%r9),%xmm5
vaesenc %xmm15,%xmm14,%xmm14
vmovups 48-128(%rcx),%xmm15
vpxor %xmm1,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm5,%xmm0,%xmm1
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm2,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm5,%xmm0,%xmm2
vaesenc %xmm15,%xmm10,%xmm10
vpxor %xmm3,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm5,%xmm0,%xmm3
vaesenc %xmm15,%xmm11,%xmm11
vpclmulqdq $0x11,%xmm5,%xmm0,%xmm5
vmovdqu 80+8(%rsp),%xmm0
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vpxor %xmm1,%xmm4,%xmm4
vmovdqu 64-32(%r9),%xmm1
vaesenc %xmm15,%xmm14,%xmm14
vmovups 64-128(%rcx),%xmm15
vpxor %xmm2,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm1,%xmm0,%xmm2
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm3,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm1,%xmm0,%xmm3
vaesenc %xmm15,%xmm10,%xmm10
movbeq 72(%r14),%r13
vpxor %xmm5,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm1,%xmm0,%xmm5
vaesenc %xmm15,%xmm11,%xmm11
movbeq 64(%r14),%r12
vpclmulqdq $0x11,%xmm1,%xmm0,%xmm1
vmovdqu 96+8(%rsp),%xmm0
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,48+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,56+8(%rsp)
vpxor %xmm2,%xmm4,%xmm4
vmovdqu 96-32(%r9),%xmm2
vaesenc %xmm15,%xmm14,%xmm14
vmovups 80-128(%rcx),%xmm15
vpxor %xmm3,%xmm6,%xmm6
vpclmulqdq $0x00,%xmm2,%xmm0,%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm2,%xmm0,%xmm5
vaesenc %xmm15,%xmm10,%xmm10
movbeq 56(%r14),%r13
vpxor %xmm1,%xmm7,%xmm7
vpclmulqdq $0x01,%xmm2,%xmm0,%xmm1
vpxor 112+8(%rsp),%xmm8,%xmm8
vaesenc %xmm15,%xmm11,%xmm11
movbeq 48(%r14),%r12
vpclmulqdq $0x11,%xmm2,%xmm0,%xmm2
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,64+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,72+8(%rsp)
vpxor %xmm3,%xmm4,%xmm4
vmovdqu 112-32(%r9),%xmm3
vaesenc %xmm15,%xmm14,%xmm14
vmovups 96-128(%rcx),%xmm15
vpxor %xmm5,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm5
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm1,%xmm6,%xmm6
vpclmulqdq $0x01,%xmm3,%xmm8,%xmm1
vaesenc %xmm15,%xmm10,%xmm10
movbeq 40(%r14),%r13
vpxor %xmm2,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm3,%xmm8,%xmm2
vaesenc %xmm15,%xmm11,%xmm11
movbeq 32(%r14),%r12
vpclmulqdq $0x11,%xmm3,%xmm8,%xmm8
vaesenc %xmm15,%xmm12,%xmm12
movq %r13,80+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
movq %r12,88+8(%rsp)
vpxor %xmm5,%xmm6,%xmm6
vaesenc %xmm15,%xmm14,%xmm14
vpxor %xmm1,%xmm6,%xmm6
vmovups 112-128(%rcx),%xmm15
vpslldq $8,%xmm6,%xmm5
vpxor %xmm2,%xmm4,%xmm4
vmovdqu 16(%r11),%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor %xmm8,%xmm7,%xmm7
vaesenc %xmm15,%xmm10,%xmm10
vpxor %xmm5,%xmm4,%xmm4
movbeq 24(%r14),%r13
vaesenc %xmm15,%xmm11,%xmm11
movbeq 16(%r14),%r12
vpalignr $8,%xmm4,%xmm4,%xmm0
vpclmulqdq $0x10,%xmm3,%xmm4,%xmm4
movq %r13,96+8(%rsp)
vaesenc %xmm15,%xmm12,%xmm12
movq %r12,104+8(%rsp)
vaesenc %xmm15,%xmm13,%xmm13
vmovups 128-128(%rcx),%xmm1
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vmovups 144-128(%rcx),%xmm15
vaesenc %xmm1,%xmm10,%xmm10
vpsrldq $8,%xmm6,%xmm6
vaesenc %xmm1,%xmm11,%xmm11
vpxor %xmm6,%xmm7,%xmm7
vaesenc %xmm1,%xmm12,%xmm12
vpxor %xmm0,%xmm4,%xmm4
movbeq 8(%r14),%r13
vaesenc %xmm1,%xmm13,%xmm13
movbeq 0(%r14),%r12
vaesenc %xmm1,%xmm14,%xmm14
vmovups 160-128(%rcx),%xmm1
cmpl $11,%ebp
jb L$enc_tail
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vaesenc %xmm1,%xmm10,%xmm10
vaesenc %xmm1,%xmm11,%xmm11
vaesenc %xmm1,%xmm12,%xmm12
vaesenc %xmm1,%xmm13,%xmm13
vmovups 176-128(%rcx),%xmm15
vaesenc %xmm1,%xmm14,%xmm14
vmovups 192-128(%rcx),%xmm1
je L$enc_tail
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vaesenc %xmm1,%xmm9,%xmm9
vaesenc %xmm1,%xmm10,%xmm10
vaesenc %xmm1,%xmm11,%xmm11
vaesenc %xmm1,%xmm12,%xmm12
vaesenc %xmm1,%xmm13,%xmm13
vmovups 208-128(%rcx),%xmm15
vaesenc %xmm1,%xmm14,%xmm14
vmovups 224-128(%rcx),%xmm1
jmp L$enc_tail
.p2align 5
L$handle_ctr32:
vmovdqu (%r11),%xmm0
vpshufb %xmm0,%xmm1,%xmm6
vmovdqu 48(%r11),%xmm5
vpaddd 64(%r11),%xmm6,%xmm10
vpaddd %xmm5,%xmm6,%xmm11
vmovdqu 0-32(%r9),%xmm3
vpaddd %xmm5,%xmm10,%xmm12
vpshufb %xmm0,%xmm10,%xmm10
vpaddd %xmm5,%xmm11,%xmm13
vpshufb %xmm0,%xmm11,%xmm11
vpxor %xmm15,%xmm10,%xmm10
vpaddd %xmm5,%xmm12,%xmm14
vpshufb %xmm0,%xmm12,%xmm12
vpxor %xmm15,%xmm11,%xmm11
vpaddd %xmm5,%xmm13,%xmm1
vpshufb %xmm0,%xmm13,%xmm13
vpshufb %xmm0,%xmm14,%xmm14
vpshufb %xmm0,%xmm1,%xmm1
jmp L$resume_ctr32
.p2align 5
L$enc_tail:
vaesenc %xmm15,%xmm9,%xmm9
vmovdqu %xmm7,16+8(%rsp)
vpalignr $8,%xmm4,%xmm4,%xmm8
vaesenc %xmm15,%xmm10,%xmm10
vpclmulqdq $0x10,%xmm3,%xmm4,%xmm4
vpxor 0(%rdi),%xmm1,%xmm2
vaesenc %xmm15,%xmm11,%xmm11
vpxor 16(%rdi),%xmm1,%xmm0
vaesenc %xmm15,%xmm12,%xmm12
vpxor 32(%rdi),%xmm1,%xmm5
vaesenc %xmm15,%xmm13,%xmm13
vpxor 48(%rdi),%xmm1,%xmm6
vaesenc %xmm15,%xmm14,%xmm14
vpxor 64(%rdi),%xmm1,%xmm7
vpxor 80(%rdi),%xmm1,%xmm3
vmovdqu (%r8),%xmm1
vaesenclast %xmm2,%xmm9,%xmm9
vmovdqu 32(%r11),%xmm2
vaesenclast %xmm0,%xmm10,%xmm10
vpaddb %xmm2,%xmm1,%xmm0
movq %r13,112+8(%rsp)
leaq 96(%rdi),%rdi
vaesenclast %xmm5,%xmm11,%xmm11
vpaddb %xmm2,%xmm0,%xmm5
movq %r12,120+8(%rsp)
leaq 96(%rsi),%rsi
vmovdqu 0-128(%rcx),%xmm15
vaesenclast %xmm6,%xmm12,%xmm12
vpaddb %xmm2,%xmm5,%xmm6
vaesenclast %xmm7,%xmm13,%xmm13
vpaddb %xmm2,%xmm6,%xmm7
vaesenclast %xmm3,%xmm14,%xmm14
vpaddb %xmm2,%xmm7,%xmm3
addq $0x60,%r10
subq $0x6,%rdx
jc L$6x_done
vmovups %xmm9,-96(%rsi)
vpxor %xmm15,%xmm1,%xmm9
vmovups %xmm10,-80(%rsi)
vmovdqa %xmm0,%xmm10
vmovups %xmm11,-64(%rsi)
vmovdqa %xmm5,%xmm11
vmovups %xmm12,-48(%rsi)
vmovdqa %xmm6,%xmm12
vmovups %xmm13,-32(%rsi)
vmovdqa %xmm7,%xmm13
vmovups %xmm14,-16(%rsi)
vmovdqa %xmm3,%xmm14
vmovdqu 32+8(%rsp),%xmm7
jmp L$oop6x
L$6x_done:
vpxor 16+8(%rsp),%xmm8,%xmm8
vpxor %xmm4,%xmm8,%xmm8
ret
.globl _aesni_gcm_decrypt
.p2align 5
_aesni_gcm_decrypt:
xorq %r10,%r10
cmpq $0x60,%rdx
jb L$gcm_dec_abort
leaq (%rsp),%rax
pushq %rbx
pushq %rbp
pushq %r12
pushq %r13
pushq %r14
pushq %r15
vzeroupper
vmovdqu (%r8),%xmm1
addq $-128,%rsp
movl 12(%r8),%ebx
leaq L$bswap_mask(%rip),%r11
leaq -128(%rcx),%r14
movq $0xf80,%r15
vmovdqu 16(%r8),%xmm8
andq $-128,%rsp
vmovdqu (%r11),%xmm0
leaq 128(%rcx),%rcx
leaq 16+32(%r9),%r9
movl 240-128(%rcx),%ebp
vpshufb %xmm0,%xmm8,%xmm8
andq %r15,%r14
andq %rsp,%r15
subq %r14,%r15
jc L$dec_no_key_aliasing
cmpq $768,%r15
jnc L$dec_no_key_aliasing
subq %r15,%rsp
L$dec_no_key_aliasing:
vmovdqu 80(%rdi),%xmm7
leaq (%rdi),%r14
vmovdqu 64(%rdi),%xmm4
leaq -192(%rdi,%rdx,1),%r15
vmovdqu 48(%rdi),%xmm5
shrq $4,%rdx
xorq %r10,%r10
vmovdqu 32(%rdi),%xmm6
vpshufb %xmm0,%xmm7,%xmm7
vmovdqu 16(%rdi),%xmm2
vpshufb %xmm0,%xmm4,%xmm4
vmovdqu (%rdi),%xmm3
vpshufb %xmm0,%xmm5,%xmm5
vmovdqu %xmm4,48(%rsp)
vpshufb %xmm0,%xmm6,%xmm6
vmovdqu %xmm5,64(%rsp)
vpshufb %xmm0,%xmm2,%xmm2
vmovdqu %xmm6,80(%rsp)
vpshufb %xmm0,%xmm3,%xmm3
vmovdqu %xmm2,96(%rsp)
vmovdqu %xmm3,112(%rsp)
call _aesni_ctr32_ghash_6x
vmovups %xmm9,-96(%rsi)
vmovups %xmm10,-80(%rsi)
vmovups %xmm11,-64(%rsi)
vmovups %xmm12,-48(%rsi)
vmovups %xmm13,-32(%rsi)
vmovups %xmm14,-16(%rsi)
vpshufb (%r11),%xmm8,%xmm8
vmovdqu %xmm8,16(%r8)
vzeroupper
movq -48(%rax),%r15
movq -40(%rax),%r14
movq -32(%rax),%r13
movq -24(%rax),%r12
movq -16(%rax),%rbp
movq -8(%rax),%rbx
leaq (%rax),%rsp
L$gcm_dec_abort:
movq %r10,%rax
ret
.p2align 5
_aesni_ctr32_6x:
vmovdqu 0-128(%rcx),%xmm4
vmovdqu 32(%r11),%xmm2
leaq -1(%rbp),%r13
vmovups 16-128(%rcx),%xmm15
leaq 32-128(%rcx),%r12
vpxor %xmm4,%xmm1,%xmm9
addl $100663296,%ebx
jc L$handle_ctr32_2
vpaddb %xmm2,%xmm1,%xmm10
vpaddb %xmm2,%xmm10,%xmm11
vpxor %xmm4,%xmm10,%xmm10
vpaddb %xmm2,%xmm11,%xmm12
vpxor %xmm4,%xmm11,%xmm11
vpaddb %xmm2,%xmm12,%xmm13
vpxor %xmm4,%xmm12,%xmm12
vpaddb %xmm2,%xmm13,%xmm14
vpxor %xmm4,%xmm13,%xmm13
vpaddb %xmm2,%xmm14,%xmm1
vpxor %xmm4,%xmm14,%xmm14
jmp L$oop_ctr32
.p2align 4
L$oop_ctr32:
vaesenc %xmm15,%xmm9,%xmm9
vaesenc %xmm15,%xmm10,%xmm10
vaesenc %xmm15,%xmm11,%xmm11
vaesenc %xmm15,%xmm12,%xmm12
vaesenc %xmm15,%xmm13,%xmm13
vaesenc %xmm15,%xmm14,%xmm14
vmovups (%r12),%xmm15
leaq 16(%r12),%r12
decl %r13d
jnz L$oop_ctr32
vmovdqu (%r12),%xmm3
vaesenc %xmm15,%xmm9,%xmm9
vpxor 0(%rdi),%xmm3,%xmm4
vaesenc %xmm15,%xmm10,%xmm10
vpxor 16(%rdi),%xmm3,%xmm5
vaesenc %xmm15,%xmm11,%xmm11
vpxor 32(%rdi),%xmm3,%xmm6
vaesenc %xmm15,%xmm12,%xmm12
vpxor 48(%rdi),%xmm3,%xmm8
vaesenc %xmm15,%xmm13,%xmm13
vpxor 64(%rdi),%xmm3,%xmm2
vaesenc %xmm15,%xmm14,%xmm14
vpxor 80(%rdi),%xmm3,%xmm3
leaq 96(%rdi),%rdi
vaesenclast %xmm4,%xmm9,%xmm9
vaesenclast %xmm5,%xmm10,%xmm10
vaesenclast %xmm6,%xmm11,%xmm11
vaesenclast %xmm8,%xmm12,%xmm12
vaesenclast %xmm2,%xmm13,%xmm13
vaesenclast %xmm3,%xmm14,%xmm14
vmovups %xmm9,0(%rsi)
vmovups %xmm10,16(%rsi)
vmovups %xmm11,32(%rsi)
vmovups %xmm12,48(%rsi)
vmovups %xmm13,64(%rsi)
vmovups %xmm14,80(%rsi)
leaq 96(%rsi),%rsi
ret
.p2align 5
L$handle_ctr32_2:
vpshufb %xmm0,%xmm1,%xmm6
vmovdqu 48(%r11),%xmm5
vpaddd 64(%r11),%xmm6,%xmm10
vpaddd %xmm5,%xmm6,%xmm11
vpaddd %xmm5,%xmm10,%xmm12
vpshufb %xmm0,%xmm10,%xmm10
vpaddd %xmm5,%xmm11,%xmm13
vpshufb %xmm0,%xmm11,%xmm11
vpxor %xmm4,%xmm10,%xmm10
vpaddd %xmm5,%xmm12,%xmm14
vpshufb %xmm0,%xmm12,%xmm12
vpxor %xmm4,%xmm11,%xmm11
vpaddd %xmm5,%xmm13,%xmm1
vpshufb %xmm0,%xmm13,%xmm13
vpxor %xmm4,%xmm12,%xmm12
vpshufb %xmm0,%xmm14,%xmm14
vpxor %xmm4,%xmm13,%xmm13
vpshufb %xmm0,%xmm1,%xmm1
vpxor %xmm4,%xmm14,%xmm14
jmp L$oop_ctr32
.globl _aesni_gcm_encrypt
.p2align 5
_aesni_gcm_encrypt:
xorq %r10,%r10
cmpq $288,%rdx
jb L$gcm_enc_abort
leaq (%rsp),%rax
pushq %rbx
pushq %rbp
pushq %r12
pushq %r13
pushq %r14
pushq %r15
vzeroupper
vmovdqu (%r8),%xmm1
addq $-128,%rsp
movl 12(%r8),%ebx
leaq L$bswap_mask(%rip),%r11
leaq -128(%rcx),%r14
movq $0xf80,%r15
leaq 128(%rcx),%rcx
vmovdqu (%r11),%xmm0
andq $-128,%rsp
movl 240-128(%rcx),%ebp
andq %r15,%r14
andq %rsp,%r15
subq %r14,%r15
jc L$enc_no_key_aliasing
cmpq $768,%r15
jnc L$enc_no_key_aliasing
subq %r15,%rsp
L$enc_no_key_aliasing:
leaq (%rsi),%r14
leaq -192(%rsi,%rdx,1),%r15
shrq $4,%rdx
call _aesni_ctr32_6x
vpshufb %xmm0,%xmm9,%xmm8
vpshufb %xmm0,%xmm10,%xmm2
vmovdqu %xmm8,112(%rsp)
vpshufb %xmm0,%xmm11,%xmm4
vmovdqu %xmm2,96(%rsp)
vpshufb %xmm0,%xmm12,%xmm5
vmovdqu %xmm4,80(%rsp)
vpshufb %xmm0,%xmm13,%xmm6
vmovdqu %xmm5,64(%rsp)
vpshufb %xmm0,%xmm14,%xmm7
vmovdqu %xmm6,48(%rsp)
call _aesni_ctr32_6x
vmovdqu 16(%r8),%xmm8
leaq 16+32(%r9),%r9
subq $12,%rdx
movq $192,%r10
vpshufb %xmm0,%xmm8,%xmm8
call _aesni_ctr32_ghash_6x
vmovdqu 32(%rsp),%xmm7
vmovdqu (%r11),%xmm0
vmovdqu 0-32(%r9),%xmm3
vpunpckhqdq %xmm7,%xmm7,%xmm1
vmovdqu 32-32(%r9),%xmm15
vmovups %xmm9,-96(%rsi)
vpshufb %xmm0,%xmm9,%xmm9
vpxor %xmm7,%xmm1,%xmm1
vmovups %xmm10,-80(%rsi)
vpshufb %xmm0,%xmm10,%xmm10
vmovups %xmm11,-64(%rsi)
vpshufb %xmm0,%xmm11,%xmm11
vmovups %xmm12,-48(%rsi)
vpshufb %xmm0,%xmm12,%xmm12
vmovups %xmm13,-32(%rsi)
vpshufb %xmm0,%xmm13,%xmm13
vmovups %xmm14,-16(%rsi)
vpshufb %xmm0,%xmm14,%xmm14
vmovdqu %xmm9,16(%rsp)
vmovdqu 48(%rsp),%xmm6
vmovdqu 16-32(%r9),%xmm0
vpunpckhqdq %xmm6,%xmm6,%xmm2
vpclmulqdq $0x00,%xmm3,%xmm7,%xmm5
vpxor %xmm6,%xmm2,%xmm2
vpclmulqdq $0x11,%xmm3,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm15,%xmm1,%xmm1
vmovdqu 64(%rsp),%xmm9
vpclmulqdq $0x00,%xmm0,%xmm6,%xmm4
vmovdqu 48-32(%r9),%xmm3
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm9,%xmm9,%xmm5
vpclmulqdq $0x11,%xmm0,%xmm6,%xmm6
vpxor %xmm9,%xmm5,%xmm5
vpxor %xmm7,%xmm6,%xmm6
vpclmulqdq $0x10,%xmm15,%xmm2,%xmm2
vmovdqu 80-32(%r9),%xmm15
vpxor %xmm1,%xmm2,%xmm2
vmovdqu 80(%rsp),%xmm1
vpclmulqdq $0x00,%xmm3,%xmm9,%xmm7
vmovdqu 64-32(%r9),%xmm0
vpxor %xmm4,%xmm7,%xmm7
vpunpckhqdq %xmm1,%xmm1,%xmm4
vpclmulqdq $0x11,%xmm3,%xmm9,%xmm9
vpxor %xmm1,%xmm4,%xmm4
vpxor %xmm6,%xmm9,%xmm9
vpclmulqdq $0x00,%xmm15,%xmm5,%xmm5
vpxor %xmm2,%xmm5,%xmm5
vmovdqu 96(%rsp),%xmm2
vpclmulqdq $0x00,%xmm0,%xmm1,%xmm6
vmovdqu 96-32(%r9),%xmm3
vpxor %xmm7,%xmm6,%xmm6
vpunpckhqdq %xmm2,%xmm2,%xmm7
vpclmulqdq $0x11,%xmm0,%xmm1,%xmm1
vpxor %xmm2,%xmm7,%xmm7
vpxor %xmm9,%xmm1,%xmm1
vpclmulqdq $0x10,%xmm15,%xmm4,%xmm4
vmovdqu 128-32(%r9),%xmm15
vpxor %xmm5,%xmm4,%xmm4
vpxor 112(%rsp),%xmm8,%xmm8
vpclmulqdq $0x00,%xmm3,%xmm2,%xmm5
vmovdqu 112-32(%r9),%xmm0
vpunpckhqdq %xmm8,%xmm8,%xmm9
vpxor %xmm6,%xmm5,%xmm5
vpclmulqdq $0x11,%xmm3,%xmm2,%xmm2
vpxor %xmm8,%xmm9,%xmm9
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm15,%xmm7,%xmm7
vpxor %xmm4,%xmm7,%xmm4
vpclmulqdq $0x00,%xmm0,%xmm8,%xmm6
vmovdqu 0-32(%r9),%xmm3
vpunpckhqdq %xmm14,%xmm14,%xmm1
vpclmulqdq $0x11,%xmm0,%xmm8,%xmm8
vpxor %xmm14,%xmm1,%xmm1
vpxor %xmm5,%xmm6,%xmm5
vpclmulqdq $0x10,%xmm15,%xmm9,%xmm9
vmovdqu 32-32(%r9),%xmm15
vpxor %xmm2,%xmm8,%xmm7
vpxor %xmm4,%xmm9,%xmm6
vmovdqu 16-32(%r9),%xmm0
vpxor %xmm5,%xmm7,%xmm9
vpclmulqdq $0x00,%xmm3,%xmm14,%xmm4
vpxor %xmm9,%xmm6,%xmm6
vpunpckhqdq %xmm13,%xmm13,%xmm2
vpclmulqdq $0x11,%xmm3,%xmm14,%xmm14
vpxor %xmm13,%xmm2,%xmm2
vpslldq $8,%xmm6,%xmm9
vpclmulqdq $0x00,%xmm15,%xmm1,%xmm1
vpxor %xmm9,%xmm5,%xmm8
vpsrldq $8,%xmm6,%xmm6
vpxor %xmm6,%xmm7,%xmm7
vpclmulqdq $0x00,%xmm0,%xmm13,%xmm5
vmovdqu 48-32(%r9),%xmm3
vpxor %xmm4,%xmm5,%xmm5
vpunpckhqdq %xmm12,%xmm12,%xmm9
vpclmulqdq $0x11,%xmm0,%xmm13,%xmm13
vpxor %xmm12,%xmm9,%xmm9
vpxor %xmm14,%xmm13,%xmm13
vpalignr $8,%xmm8,%xmm8,%xmm14
vpclmulqdq $0x10,%xmm15,%xmm2,%xmm2
vmovdqu 80-32(%r9),%xmm15
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm3,%xmm12,%xmm4
vmovdqu 64-32(%r9),%xmm0
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm11,%xmm11,%xmm1
vpclmulqdq $0x11,%xmm3,%xmm12,%xmm12
vpxor %xmm11,%xmm1,%xmm1
vpxor %xmm13,%xmm12,%xmm12
vxorps 16(%rsp),%xmm7,%xmm7
vpclmulqdq $0x00,%xmm15,%xmm9,%xmm9
vpxor %xmm2,%xmm9,%xmm9
vpclmulqdq $0x10,16(%r11),%xmm8,%xmm8
vxorps %xmm14,%xmm8,%xmm8
vpclmulqdq $0x00,%xmm0,%xmm11,%xmm5
vmovdqu 96-32(%r9),%xmm3
vpxor %xmm4,%xmm5,%xmm5
vpunpckhqdq %xmm10,%xmm10,%xmm2
vpclmulqdq $0x11,%xmm0,%xmm11,%xmm11
vpxor %xmm10,%xmm2,%xmm2
vpalignr $8,%xmm8,%xmm8,%xmm14
vpxor %xmm12,%xmm11,%xmm11
vpclmulqdq $0x10,%xmm15,%xmm1,%xmm1
vmovdqu 128-32(%r9),%xmm15
vpxor %xmm9,%xmm1,%xmm1
vxorps %xmm7,%xmm14,%xmm14
vpclmulqdq $0x10,16(%r11),%xmm8,%xmm8
vxorps %xmm14,%xmm8,%xmm8
vpclmulqdq $0x00,%xmm3,%xmm10,%xmm4
vmovdqu 112-32(%r9),%xmm0
vpxor %xmm5,%xmm4,%xmm4
vpunpckhqdq %xmm8,%xmm8,%xmm9
vpclmulqdq $0x11,%xmm3,%xmm10,%xmm10
vpxor %xmm8,%xmm9,%xmm9
vpxor %xmm11,%xmm10,%xmm10
vpclmulqdq $0x00,%xmm15,%xmm2,%xmm2
vpxor %xmm1,%xmm2,%xmm2
vpclmulqdq $0x00,%xmm0,%xmm8,%xmm5
vpclmulqdq $0x11,%xmm0,%xmm8,%xmm7
vpxor %xmm4,%xmm5,%xmm5
vpclmulqdq $0x10,%xmm15,%xmm9,%xmm6
vpxor %xmm10,%xmm7,%xmm7
vpxor %xmm2,%xmm6,%xmm6
vpxor %xmm5,%xmm7,%xmm4
vpxor %xmm4,%xmm6,%xmm6
vpslldq $8,%xmm6,%xmm1
vmovdqu 16(%r11),%xmm3
vpsrldq $8,%xmm6,%xmm6
vpxor %xmm1,%xmm5,%xmm8
vpxor %xmm6,%xmm7,%xmm7
vpalignr $8,%xmm8,%xmm8,%xmm2
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm8
vpxor %xmm2,%xmm8,%xmm8
vpalignr $8,%xmm8,%xmm8,%xmm2
vpclmulqdq $0x10,%xmm3,%xmm8,%xmm8
vpxor %xmm7,%xmm2,%xmm2
vpxor %xmm2,%xmm8,%xmm8
vpshufb (%r11),%xmm8,%xmm8
vmovdqu %xmm8,16(%r8)
vzeroupper
movq -48(%rax),%r15
movq -40(%rax),%r14
movq -32(%rax),%r13
movq -24(%rax),%r12
movq -16(%rax),%rbp
movq -8(%rax),%rbx
leaq (%rax),%rsp
L$gcm_enc_abort:
movq %r10,%rax
ret
.p2align 6
L$bswap_mask:
.byte 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
L$poly:
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0xc2
L$one_msb:
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
L$two_lsb:
.byte 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
L$one_lsb:
.byte 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 65,69,83,45,78,73,32,71,67,77,32,109,111,100,117,108,101,32,102,111,114,32,120,56,54,95,54,52,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
.p2align 6

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

1176
crypto/aesgcm/ghash-x86.pl Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,670 @@
#! /usr/bin/env perl
# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
#
# Licensed under the OpenSSL license (the "License"). You may not use
# this file except in compliance with the License. You can obtain a copy
# in the file LICENSE in the source distribution or at
# https://www.openssl.org/source/license.html
#
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
# ====================================================================
#
# GHASH for for PowerISA v2.07.
#
# July 2014
#
# Accurate performance measurements are problematic, because it's
# always virtualized setup with possibly throttled processor.
# Relative comparison is therefore more informative. This initial
# version is ~2.1x slower than hardware-assisted AES-128-CTR, ~12x
# faster than "4-bit" integer-only compiler-generated 64-bit code.
# "Initial version" means that there is room for futher improvement.
# May 2016
#
# 2x aggregated reduction improves performance by 50% (resulting
# performance on POWER8 is 1 cycle per processed byte), and 4x
# aggregated reduction - by 170% or 2.7x (resulting in 0.55 cpb).
$flavour=shift;
$output =shift;
if ($flavour =~ /64/) {
$SIZE_T=8;
$LRSAVE=2*$SIZE_T;
$STU="stdu";
$POP="ld";
$PUSH="std";
$UCMP="cmpld";
$SHRI="srdi";
} elsif ($flavour =~ /32/) {
$SIZE_T=4;
$LRSAVE=$SIZE_T;
$STU="stwu";
$POP="lwz";
$PUSH="stw";
$UCMP="cmplw";
$SHRI="srwi";
} else { die "nonsense $flavour"; }
$sp="r1";
$FRAME=6*$SIZE_T+13*16; # 13*16 is for v20-v31 offload
$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or
( $xlate="${dir}../../../perlasm/ppc-xlate.pl" and -f $xlate) or
die "can't locate ppc-xlate.pl";
open STDOUT,"| $^X $xlate $flavour $output" || die "can't call $xlate: $!";
my ($Xip,$Htbl,$inp,$len)=map("r$_",(3..6)); # argument block
my ($Xl,$Xm,$Xh,$IN)=map("v$_",(0..3));
my ($zero,$t0,$t1,$t2,$xC2,$H,$Hh,$Hl,$lemask)=map("v$_",(4..12));
my ($Xl1,$Xm1,$Xh1,$IN1,$H2,$H2h,$H2l)=map("v$_",(13..19));
my $vrsave="r12";
$code=<<___;
.machine "any"
.text
.globl .gcm_init_p8
.align 5
.gcm_init_p8:
li r0,-4096
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $H,0,r4 # load H
vspltisb $xC2,-16 # 0xf0
vspltisb $t0,1 # one
vaddubm $xC2,$xC2,$xC2 # 0xe0
vxor $zero,$zero,$zero
vor $xC2,$xC2,$t0 # 0xe1
vsldoi $xC2,$xC2,$zero,15 # 0xe1...
vsldoi $t1,$zero,$t0,1 # ...1
vaddubm $xC2,$xC2,$xC2 # 0xc2...
vspltisb $t2,7
vor $xC2,$xC2,$t1 # 0xc2....01
vspltb $t1,$H,0 # most significant byte
vsl $H,$H,$t0 # H<<=1
vsrab $t1,$t1,$t2 # broadcast carry bit
vand $t1,$t1,$xC2
vxor $IN,$H,$t1 # twisted H
vsldoi $H,$IN,$IN,8 # twist even more ...
vsldoi $xC2,$zero,$xC2,8 # 0xc2.0
vsldoi $Hl,$zero,$H,8 # ... and split
vsldoi $Hh,$H,$zero,8
stvx_u $xC2,0,r3 # save pre-computed table
stvx_u $Hl,r8,r3
li r8,0x40
stvx_u $H, r9,r3
li r9,0x50
stvx_u $Hh,r10,r3
li r10,0x60
vpmsumd $Xl,$IN,$Hl # H.lo·H.lo
vpmsumd $Xm,$IN,$H # H.hi·H.lo+H.lo·H.hi
vpmsumd $Xh,$IN,$Hh # H.hi·H.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
vxor $IN1,$Xl,$t1
vsldoi $H2,$IN1,$IN1,8
vsldoi $H2l,$zero,$H2,8
vsldoi $H2h,$H2,$zero,8
stvx_u $H2l,r8,r3 # save H^2
li r8,0x70
stvx_u $H2,r9,r3
li r9,0x80
stvx_u $H2h,r10,r3
li r10,0x90
___
{
my ($t4,$t5,$t6) = ($Hl,$H,$Hh);
$code.=<<___;
vpmsumd $Xl,$IN,$H2l # H.lo·H^2.lo
vpmsumd $Xl1,$IN1,$H2l # H^2.lo·H^2.lo
vpmsumd $Xm,$IN,$H2 # H.hi·H^2.lo+H.lo·H^2.hi
vpmsumd $Xm1,$IN1,$H2 # H^2.hi·H^2.lo+H^2.lo·H^2.hi
vpmsumd $Xh,$IN,$H2h # H.hi·H^2.hi
vpmsumd $Xh1,$IN1,$H2h # H^2.hi·H^2.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vpmsumd $t6,$Xl1,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vsldoi $t4,$Xm1,$zero,8
vsldoi $t5,$zero,$Xm1,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vxor $Xl1,$Xl1,$t4
vxor $Xh1,$Xh1,$t5
vsldoi $Xl,$Xl,$Xl,8
vsldoi $Xl1,$Xl1,$Xl1,8
vxor $Xl,$Xl,$t2
vxor $Xl1,$Xl1,$t6
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vsldoi $t5,$Xl1,$Xl1,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vpmsumd $Xl1,$Xl1,$xC2
vxor $t1,$t1,$Xh
vxor $t5,$t5,$Xh1
vxor $Xl,$Xl,$t1
vxor $Xl1,$Xl1,$t5
vsldoi $H,$Xl,$Xl,8
vsldoi $H2,$Xl1,$Xl1,8
vsldoi $Hl,$zero,$H,8
vsldoi $Hh,$H,$zero,8
vsldoi $H2l,$zero,$H2,8
vsldoi $H2h,$H2,$zero,8
stvx_u $Hl,r8,r3 # save H^3
li r8,0xa0
stvx_u $H,r9,r3
li r9,0xb0
stvx_u $Hh,r10,r3
li r10,0xc0
stvx_u $H2l,r8,r3 # save H^4
stvx_u $H2,r9,r3
stvx_u $H2h,r10,r3
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,2,0
.long 0
.size .gcm_init_p8,.-.gcm_init_p8
___
}
$code.=<<___;
.globl .gcm_gmult_p8
.align 5
.gcm_gmult_p8:
lis r0,0xfff8
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $IN,0,$Xip # load Xi
lvx_u $Hl,r8,$Htbl # load pre-computed table
le?lvsl $lemask,r0,r0
lvx_u $H, r9,$Htbl
le?vspltisb $t0,0x07
lvx_u $Hh,r10,$Htbl
le?vxor $lemask,$lemask,$t0
lvx_u $xC2,0,$Htbl
le?vperm $IN,$IN,$IN,$lemask
vxor $zero,$zero,$zero
vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
vxor $Xl,$Xl,$t1
le?vperm $Xl,$Xl,$Xl,$lemask
stvx_u $Xl,0,$Xip # write out Xi
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,2,0
.long 0
.size .gcm_gmult_p8,.-.gcm_gmult_p8
.globl .gcm_ghash_p8
.align 5
.gcm_ghash_p8:
li r0,-4096
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $Xl,0,$Xip # load Xi
lvx_u $Hl,r8,$Htbl # load pre-computed table
li r8,0x40
le?lvsl $lemask,r0,r0
lvx_u $H, r9,$Htbl
li r9,0x50
le?vspltisb $t0,0x07
lvx_u $Hh,r10,$Htbl
li r10,0x60
le?vxor $lemask,$lemask,$t0
lvx_u $xC2,0,$Htbl
le?vperm $Xl,$Xl,$Xl,$lemask
vxor $zero,$zero,$zero
${UCMP}i $len,64
bge Lgcm_ghash_p8_4x
lvx_u $IN,0,$inp
addi $inp,$inp,16
subic. $len,$len,16
le?vperm $IN,$IN,$IN,$lemask
vxor $IN,$IN,$Xl
beq Lshort
lvx_u $H2l,r8,$Htbl # load H^2
li r8,16
lvx_u $H2, r9,$Htbl
add r9,$inp,$len # end of input
lvx_u $H2h,r10,$Htbl
be?b Loop_2x
.align 5
Loop_2x:
lvx_u $IN1,0,$inp
le?vperm $IN1,$IN1,$IN1,$lemask
subic $len,$len,32
vpmsumd $Xl,$IN,$H2l # H^2.lo·Xi.lo
vpmsumd $Xl1,$IN1,$Hl # H.lo·Xi+1.lo
subfe r0,r0,r0 # borrow?-1:0
vpmsumd $Xm,$IN,$H2 # H^2.hi·Xi.lo+H^2.lo·Xi.hi
vpmsumd $Xm1,$IN1,$H # H.hi·Xi+1.lo+H.lo·Xi+1.hi
and r0,r0,$len
vpmsumd $Xh,$IN,$H2h # H^2.hi·Xi.hi
vpmsumd $Xh1,$IN1,$Hh # H.hi·Xi+1.hi
add $inp,$inp,r0
vxor $Xl,$Xl,$Xl1
vxor $Xm,$Xm,$Xm1
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xh,$Xh,$Xh1
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
lvx_u $IN,r8,$inp
addi $inp,$inp,32
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
le?vperm $IN,$IN,$IN,$lemask
vxor $t1,$t1,$Xh
vxor $IN,$IN,$t1
vxor $IN,$IN,$Xl
$UCMP r9,$inp
bgt Loop_2x # done yet?
cmplwi $len,0
bne Leven
Lshort:
vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
Leven:
vxor $Xl,$Xl,$t1
le?vperm $Xl,$Xl,$Xl,$lemask
stvx_u $Xl,0,$Xip # write out Xi
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,4,0
.long 0
___
{
my ($Xl3,$Xm2,$IN2,$H3l,$H3,$H3h,
$Xh3,$Xm3,$IN3,$H4l,$H4,$H4h) = map("v$_",(20..31));
my $IN0=$IN;
my ($H21l,$H21h,$loperm,$hiperm) = ($Hl,$Hh,$H2l,$H2h);
$code.=<<___;
.align 5
.gcm_ghash_p8_4x:
Lgcm_ghash_p8_4x:
$STU $sp,-$FRAME($sp)
li r10,`15+6*$SIZE_T`
li r11,`31+6*$SIZE_T`
stvx v20,r10,$sp
addi r10,r10,32
stvx v21,r11,$sp
addi r11,r11,32
stvx v22,r10,$sp
addi r10,r10,32
stvx v23,r11,$sp
addi r11,r11,32
stvx v24,r10,$sp
addi r10,r10,32
stvx v25,r11,$sp
addi r11,r11,32
stvx v26,r10,$sp
addi r10,r10,32
stvx v27,r11,$sp
addi r11,r11,32
stvx v28,r10,$sp
addi r10,r10,32
stvx v29,r11,$sp
addi r11,r11,32
stvx v30,r10,$sp
li r10,0x60
stvx v31,r11,$sp
li r0,-1
stw $vrsave,`$FRAME-4`($sp) # save vrsave
mtspr 256,r0 # preserve all AltiVec registers
lvsl $t0,0,r8 # 0x0001..0e0f
#lvx_u $H2l,r8,$Htbl # load H^2
li r8,0x70
lvx_u $H2, r9,$Htbl
li r9,0x80
vspltisb $t1,8 # 0x0808..0808
#lvx_u $H2h,r10,$Htbl
li r10,0x90
lvx_u $H3l,r8,$Htbl # load H^3
li r8,0xa0
lvx_u $H3, r9,$Htbl
li r9,0xb0
lvx_u $H3h,r10,$Htbl
li r10,0xc0
lvx_u $H4l,r8,$Htbl # load H^4
li r8,0x10
lvx_u $H4, r9,$Htbl
li r9,0x20
lvx_u $H4h,r10,$Htbl
li r10,0x30
vsldoi $t2,$zero,$t1,8 # 0x0000..0808
vaddubm $hiperm,$t0,$t2 # 0x0001..1617
vaddubm $loperm,$t1,$hiperm # 0x0809..1e1f
$SHRI $len,$len,4 # this allows to use sign bit
# as carry
lvx_u $IN0,0,$inp # load input
lvx_u $IN1,r8,$inp
subic. $len,$len,8
lvx_u $IN2,r9,$inp
lvx_u $IN3,r10,$inp
addi $inp,$inp,0x40
le?vperm $IN0,$IN0,$IN0,$lemask
le?vperm $IN1,$IN1,$IN1,$lemask
le?vperm $IN2,$IN2,$IN2,$lemask
le?vperm $IN3,$IN3,$IN3,$lemask
vxor $Xh,$IN0,$Xl
vpmsumd $Xl1,$IN1,$H3l
vpmsumd $Xm1,$IN1,$H3
vpmsumd $Xh1,$IN1,$H3h
vperm $H21l,$H2,$H,$hiperm
vperm $t0,$IN2,$IN3,$loperm
vperm $H21h,$H2,$H,$loperm
vperm $t1,$IN2,$IN3,$hiperm
vpmsumd $Xm2,$IN2,$H2 # H^2.lo·Xi+2.hi+H^2.hi·Xi+2.lo
vpmsumd $Xl3,$t0,$H21l # H^2.lo·Xi+2.lo+H.lo·Xi+3.lo
vpmsumd $Xm3,$IN3,$H # H.hi·Xi+3.lo +H.lo·Xi+3.hi
vpmsumd $Xh3,$t1,$H21h # H^2.hi·Xi+2.hi+H.hi·Xi+3.hi
vxor $Xm2,$Xm2,$Xm1
vxor $Xl3,$Xl3,$Xl1
vxor $Xm3,$Xm3,$Xm2
vxor $Xh3,$Xh3,$Xh1
blt Ltail_4x
Loop_4x:
lvx_u $IN0,0,$inp
lvx_u $IN1,r8,$inp
subic. $len,$len,4
lvx_u $IN2,r9,$inp
lvx_u $IN3,r10,$inp
addi $inp,$inp,0x40
le?vperm $IN1,$IN1,$IN1,$lemask
le?vperm $IN2,$IN2,$IN2,$lemask
le?vperm $IN3,$IN3,$IN3,$lemask
le?vperm $IN0,$IN0,$IN0,$lemask
vpmsumd $Xl,$Xh,$H4l # H^4.lo·Xi.lo
vpmsumd $Xm,$Xh,$H4 # H^4.hi·Xi.lo+H^4.lo·Xi.hi
vpmsumd $Xh,$Xh,$H4h # H^4.hi·Xi.hi
vpmsumd $Xl1,$IN1,$H3l
vpmsumd $Xm1,$IN1,$H3
vpmsumd $Xh1,$IN1,$H3h
vxor $Xl,$Xl,$Xl3
vxor $Xm,$Xm,$Xm3
vxor $Xh,$Xh,$Xh3
vperm $t0,$IN2,$IN3,$loperm
vperm $t1,$IN2,$IN3,$hiperm
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vpmsumd $Xl3,$t0,$H21l # H.lo·Xi+3.lo +H^2.lo·Xi+2.lo
vpmsumd $Xh3,$t1,$H21h # H.hi·Xi+3.hi +H^2.hi·Xi+2.hi
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xm2,$IN2,$H2 # H^2.hi·Xi+2.lo+H^2.lo·Xi+2.hi
vpmsumd $Xm3,$IN3,$H # H.hi·Xi+3.lo +H.lo·Xi+3.hi
vpmsumd $Xl,$Xl,$xC2
vxor $Xl3,$Xl3,$Xl1
vxor $Xh3,$Xh3,$Xh1
vxor $Xh,$Xh,$IN0
vxor $Xm2,$Xm2,$Xm1
vxor $Xh,$Xh,$t1
vxor $Xm3,$Xm3,$Xm2
vxor $Xh,$Xh,$Xl
bge Loop_4x
Ltail_4x:
vpmsumd $Xl,$Xh,$H4l # H^4.lo·Xi.lo
vpmsumd $Xm,$Xh,$H4 # H^4.hi·Xi.lo+H^4.lo·Xi.hi
vpmsumd $Xh,$Xh,$H4h # H^4.hi·Xi.hi
vxor $Xl,$Xl,$Xl3
vxor $Xm,$Xm,$Xm3
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xh,$Xh,$Xh3
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
vxor $Xl,$Xl,$t1
addic. $len,$len,4
beq Ldone_4x
lvx_u $IN0,0,$inp
${UCMP}i $len,2
li $len,-4
blt Lone
lvx_u $IN1,r8,$inp
beq Ltwo
Lthree:
lvx_u $IN2,r9,$inp
le?vperm $IN0,$IN0,$IN0,$lemask
le?vperm $IN1,$IN1,$IN1,$lemask
le?vperm $IN2,$IN2,$IN2,$lemask
vxor $Xh,$IN0,$Xl
vmr $H4l,$H3l
vmr $H4, $H3
vmr $H4h,$H3h
vperm $t0,$IN1,$IN2,$loperm
vperm $t1,$IN1,$IN2,$hiperm
vpmsumd $Xm2,$IN1,$H2 # H^2.lo·Xi+1.hi+H^2.hi·Xi+1.lo
vpmsumd $Xm3,$IN2,$H # H.hi·Xi+2.lo +H.lo·Xi+2.hi
vpmsumd $Xl3,$t0,$H21l # H^2.lo·Xi+1.lo+H.lo·Xi+2.lo
vpmsumd $Xh3,$t1,$H21h # H^2.hi·Xi+1.hi+H.hi·Xi+2.hi
vxor $Xm3,$Xm3,$Xm2
b Ltail_4x
.align 4
Ltwo:
le?vperm $IN0,$IN0,$IN0,$lemask
le?vperm $IN1,$IN1,$IN1,$lemask
vxor $Xh,$IN0,$Xl
vperm $t0,$zero,$IN1,$loperm
vperm $t1,$zero,$IN1,$hiperm
vsldoi $H4l,$zero,$H2,8
vmr $H4, $H2
vsldoi $H4h,$H2,$zero,8
vpmsumd $Xl3,$t0, $H21l # H.lo·Xi+1.lo
vpmsumd $Xm3,$IN1,$H # H.hi·Xi+1.lo+H.lo·Xi+2.hi
vpmsumd $Xh3,$t1, $H21h # H.hi·Xi+1.hi
b Ltail_4x
.align 4
Lone:
le?vperm $IN0,$IN0,$IN0,$lemask
vsldoi $H4l,$zero,$H,8
vmr $H4, $H
vsldoi $H4h,$H,$zero,8
vxor $Xh,$IN0,$Xl
vxor $Xl3,$Xl3,$Xl3
vxor $Xm3,$Xm3,$Xm3
vxor $Xh3,$Xh3,$Xh3
b Ltail_4x
Ldone_4x:
le?vperm $Xl,$Xl,$Xl,$lemask
stvx_u $Xl,0,$Xip # write out Xi
li r10,`15+6*$SIZE_T`
li r11,`31+6*$SIZE_T`
mtspr 256,$vrsave
lvx v20,r10,$sp
addi r10,r10,32
lvx v21,r11,$sp
addi r11,r11,32
lvx v22,r10,$sp
addi r10,r10,32
lvx v23,r11,$sp
addi r11,r11,32
lvx v24,r10,$sp
addi r10,r10,32
lvx v25,r11,$sp
addi r11,r11,32
lvx v26,r10,$sp
addi r10,r10,32
lvx v27,r11,$sp
addi r11,r11,32
lvx v28,r10,$sp
addi r10,r10,32
lvx v29,r11,$sp
addi r11,r11,32
lvx v30,r10,$sp
lvx v31,r11,$sp
addi $sp,$sp,$FRAME
blr
.long 0
.byte 0,12,0x04,0,0x80,0,4,0
.long 0
___
}
$code.=<<___;
.size .gcm_ghash_p8,.-.gcm_ghash_p8
.asciz "GHASH for PowerISA 2.07, CRYPTOGAMS by <appro\@openssl.org>"
.align 2
___
foreach (split("\n",$code)) {
s/\`([^\`]*)\`/eval $1/geo;
if ($flavour =~ /le$/o) { # little-endian
s/le\?//o or
s/be\?/#be#/o;
} else {
s/le\?/#le#/o or
s/be\?//o;
}
print $_,"\n";
}
close STDOUT; # enforce flush

View file

@ -0,0 +1,430 @@
#! /usr/bin/env perl
# Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
#
# Licensed under the OpenSSL license (the "License"). You may not use
# this file except in compliance with the License. You can obtain a copy
# in the file LICENSE in the source distribution or at
# https://www.openssl.org/source/license.html
#
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see http://www.openssl.org/~appro/cryptogams/.
# ====================================================================
#
# GHASH for ARMv8 Crypto Extension, 64-bit polynomial multiplication.
#
# June 2014
#
# Initial version was developed in tight cooperation with Ard
# Biesheuvel <ard.biesheuvel@linaro.org> from bits-n-pieces from
# other assembly modules. Just like aesv8-armx.pl this module
# supports both AArch32 and AArch64 execution modes.
#
# July 2014
#
# Implement 2x aggregated reduction [see ghash-x86.pl for background
# information].
#
# Current performance in cycles per processed byte:
#
# PMULL[2] 32-bit NEON(*)
# Apple A7 0.92 5.62
# Cortex-A53 1.01 8.39
# Cortex-A57 1.17 7.61
# Denver 0.71 6.02
# Mongoose 1.10 8.06
#
# (*) presented for reference/comparison purposes;
$flavour = shift;
$output = shift;
$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
( $xlate="${dir}arm-xlate.pl" and -f $xlate ) or
( $xlate="${dir}../../../perlasm/arm-xlate.pl" and -f $xlate) or
die "can't locate arm-xlate.pl";
open OUT,"| \"$^X\" $xlate $flavour $output";
*STDOUT=*OUT;
$Xi="x0"; # argument block
$Htbl="x1";
$inp="x2";
$len="x3";
$inc="x12";
{
my ($Xl,$Xm,$Xh,$IN)=map("q$_",(0..3));
my ($t0,$t1,$t2,$xC2,$H,$Hhl,$H2)=map("q$_",(8..14));
$code=<<___;
#include <openssl/arm_arch.h>
.text
___
$code.=".arch armv8-a+crypto\n" if ($flavour =~ /64/);
$code.=<<___ if ($flavour !~ /64/);
.fpu neon
.code 32
#undef __thumb2__
___
################################################################################
# void gcm_init_v8(u128 Htable[16],const u64 H[2]);
#
# input: 128-bit H - secret parameter E(K,0^128)
# output: precomputed table filled with degrees of twisted H;
# H is twisted to handle reverse bitness of GHASH;
# only few of 16 slots of Htable[16] are used;
# data is opaque to outside world (which allows to
# optimize the code independently);
#
$code.=<<___;
.global gcm_init_v8
.type gcm_init_v8,%function
.align 4
gcm_init_v8:
vld1.64 {$t1},[x1] @ load input H
vmov.i8 $xC2,#0xe1
vshl.i64 $xC2,$xC2,#57 @ 0xc2.0
vext.8 $IN,$t1,$t1,#8
vshr.u64 $t2,$xC2,#63
vdup.32 $t1,${t1}[1]
vext.8 $t0,$t2,$xC2,#8 @ t0=0xc2....01
vshr.u64 $t2,$IN,#63
vshr.s32 $t1,$t1,#31 @ broadcast carry bit
vand $t2,$t2,$t0
vshl.i64 $IN,$IN,#1
vext.8 $t2,$t2,$t2,#8
vand $t0,$t0,$t1
vorr $IN,$IN,$t2 @ H<<<=1
veor $H,$IN,$t0 @ twisted H
vst1.64 {$H},[x0],#16 @ store Htable[0]
@ calculate H^2
vext.8 $t0,$H,$H,#8 @ Karatsuba pre-processing
vpmull.p64 $Xl,$H,$H
veor $t0,$t0,$H
vpmull2.p64 $Xh,$H,$H
vpmull.p64 $Xm,$t0,$t0
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
veor $Xm,$Xm,$t1
veor $Xm,$Xm,$t2
vpmull.p64 $t2,$Xl,$xC2 @ 1st phase
vmov $Xh#lo,$Xm#hi @ Xh|Xm - 256-bit result
vmov $Xm#hi,$Xl#lo @ Xm is rotated Xl
veor $Xl,$Xm,$t2
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase
vpmull.p64 $Xl,$Xl,$xC2
veor $t2,$t2,$Xh
veor $H2,$Xl,$t2
vext.8 $t1,$H2,$H2,#8 @ Karatsuba pre-processing
veor $t1,$t1,$H2
vext.8 $Hhl,$t0,$t1,#8 @ pack Karatsuba pre-processed
vst1.64 {$Hhl-$H2},[x0] @ store Htable[1..2]
ret
.size gcm_init_v8,.-gcm_init_v8
___
################################################################################
# void gcm_gmult_v8(u64 Xi[2],const u128 Htable[16]);
#
# input: Xi - current hash value;
# Htable - table precomputed in gcm_init_v8;
# output: Xi - next hash value Xi;
#
$code.=<<___;
.global gcm_gmult_v8
.type gcm_gmult_v8,%function
.align 4
gcm_gmult_v8:
vld1.64 {$t1},[$Xi] @ load Xi
vmov.i8 $xC2,#0xe1
vld1.64 {$H-$Hhl},[$Htbl] @ load twisted H, ...
vshl.u64 $xC2,$xC2,#57
#ifndef __ARMEB__
vrev64.8 $t1,$t1
#endif
vext.8 $IN,$t1,$t1,#8
vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
veor $t1,$t1,$IN @ Karatsuba pre-processing
vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
veor $Xm,$Xm,$t1
veor $Xm,$Xm,$t2
vpmull.p64 $t2,$Xl,$xC2 @ 1st phase of reduction
vmov $Xh#lo,$Xm#hi @ Xh|Xm - 256-bit result
vmov $Xm#hi,$Xl#lo @ Xm is rotated Xl
veor $Xl,$Xm,$t2
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase of reduction
vpmull.p64 $Xl,$Xl,$xC2
veor $t2,$t2,$Xh
veor $Xl,$Xl,$t2
#ifndef __ARMEB__
vrev64.8 $Xl,$Xl
#endif
vext.8 $Xl,$Xl,$Xl,#8
vst1.64 {$Xl},[$Xi] @ write out Xi
ret
.size gcm_gmult_v8,.-gcm_gmult_v8
___
################################################################################
# void gcm_ghash_v8(u64 Xi[2],const u128 Htable[16],const u8 *inp,size_t len);
#
# input: table precomputed in gcm_init_v8;
# current hash value Xi;
# pointer to input data;
# length of input data in bytes, but divisible by block size;
# output: next hash value Xi;
#
$code.=<<___;
.global gcm_ghash_v8
.type gcm_ghash_v8,%function
.align 4
gcm_ghash_v8:
___
$code.=<<___ if ($flavour !~ /64/);
vstmdb sp!,{d8-d15} @ 32-bit ABI says so
___
$code.=<<___;
vld1.64 {$Xl},[$Xi] @ load [rotated] Xi
@ "[rotated]" means that
@ loaded value would have
@ to be rotated in order to
@ make it appear as in
@ alorithm specification
subs $len,$len,#32 @ see if $len is 32 or larger
mov $inc,#16 @ $inc is used as post-
@ increment for input pointer;
@ as loop is modulo-scheduled
@ $inc is zeroed just in time
@ to preclude oversteping
@ inp[len], which means that
@ last block[s] are actually
@ loaded twice, but last
@ copy is not processed
vld1.64 {$H-$Hhl},[$Htbl],#32 @ load twisted H, ..., H^2
vmov.i8 $xC2,#0xe1
vld1.64 {$H2},[$Htbl]
cclr $inc,eq @ is it time to zero $inc?
vext.8 $Xl,$Xl,$Xl,#8 @ rotate Xi
vld1.64 {$t0},[$inp],#16 @ load [rotated] I[0]
vshl.u64 $xC2,$xC2,#57 @ compose 0xc2.0 constant
#ifndef __ARMEB__
vrev64.8 $t0,$t0
vrev64.8 $Xl,$Xl
#endif
vext.8 $IN,$t0,$t0,#8 @ rotate I[0]
b.lo .Lodd_tail_v8 @ $len was less than 32
___
{ my ($Xln,$Xmn,$Xhn,$In) = map("q$_",(4..7));
#######
# Xi+2 =[H*(Ii+1 + Xi+1)] mod P =
# [(H*Ii+1) + (H*Xi+1)] mod P =
# [(H*Ii+1) + H^2*(Ii+Xi)] mod P
#
$code.=<<___;
vld1.64 {$t1},[$inp],$inc @ load [rotated] I[1]
#ifndef __ARMEB__
vrev64.8 $t1,$t1
#endif
vext.8 $In,$t1,$t1,#8
veor $IN,$IN,$Xl @ I[i]^=Xi
vpmull.p64 $Xln,$H,$In @ H·Ii+1
veor $t1,$t1,$In @ Karatsuba pre-processing
vpmull2.p64 $Xhn,$H,$In
b .Loop_mod2x_v8
.align 4
.Loop_mod2x_v8:
vext.8 $t2,$IN,$IN,#8
subs $len,$len,#32 @ is there more data?
vpmull.p64 $Xl,$H2,$IN @ H^2.lo·Xi.lo
cclr $inc,lo @ is it time to zero $inc?
vpmull.p64 $Xmn,$Hhl,$t1
veor $t2,$t2,$IN @ Karatsuba pre-processing
vpmull2.p64 $Xh,$H2,$IN @ H^2.hi·Xi.hi
veor $Xl,$Xl,$Xln @ accumulate
vpmull2.p64 $Xm,$Hhl,$t2 @ (H^2.lo+H^2.hi)·(Xi.lo+Xi.hi)
vld1.64 {$t0},[$inp],$inc @ load [rotated] I[i+2]
veor $Xh,$Xh,$Xhn
cclr $inc,eq @ is it time to zero $inc?
veor $Xm,$Xm,$Xmn
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
veor $Xm,$Xm,$t1
vld1.64 {$t1},[$inp],$inc @ load [rotated] I[i+3]
#ifndef __ARMEB__
vrev64.8 $t0,$t0
#endif
veor $Xm,$Xm,$t2
vpmull.p64 $t2,$Xl,$xC2 @ 1st phase of reduction
#ifndef __ARMEB__
vrev64.8 $t1,$t1
#endif
vmov $Xh#lo,$Xm#hi @ Xh|Xm - 256-bit result
vmov $Xm#hi,$Xl#lo @ Xm is rotated Xl
vext.8 $In,$t1,$t1,#8
vext.8 $IN,$t0,$t0,#8
veor $Xl,$Xm,$t2
vpmull.p64 $Xln,$H,$In @ H·Ii+1
veor $IN,$IN,$Xh @ accumulate $IN early
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase of reduction
vpmull.p64 $Xl,$Xl,$xC2
veor $IN,$IN,$t2
veor $t1,$t1,$In @ Karatsuba pre-processing
veor $IN,$IN,$Xl
vpmull2.p64 $Xhn,$H,$In
b.hs .Loop_mod2x_v8 @ there was at least 32 more bytes
veor $Xh,$Xh,$t2
vext.8 $IN,$t0,$t0,#8 @ re-construct $IN
adds $len,$len,#32 @ re-construct $len
veor $Xl,$Xl,$Xh @ re-construct $Xl
b.eq .Ldone_v8 @ is $len zero?
___
}
$code.=<<___;
.Lodd_tail_v8:
vext.8 $t2,$Xl,$Xl,#8
veor $IN,$IN,$Xl @ inp^=Xi
veor $t1,$t0,$t2 @ $t1 is rotated inp^Xi
vpmull.p64 $Xl,$H,$IN @ H.lo·Xi.lo
veor $t1,$t1,$IN @ Karatsuba pre-processing
vpmull2.p64 $Xh,$H,$IN @ H.hi·Xi.hi
vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi)
vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
veor $t2,$Xl,$Xh
veor $Xm,$Xm,$t1
veor $Xm,$Xm,$t2
vpmull.p64 $t2,$Xl,$xC2 @ 1st phase of reduction
vmov $Xh#lo,$Xm#hi @ Xh|Xm - 256-bit result
vmov $Xm#hi,$Xl#lo @ Xm is rotated Xl
veor $Xl,$Xm,$t2
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase of reduction
vpmull.p64 $Xl,$Xl,$xC2
veor $t2,$t2,$Xh
veor $Xl,$Xl,$t2
.Ldone_v8:
#ifndef __ARMEB__
vrev64.8 $Xl,$Xl
#endif
vext.8 $Xl,$Xl,$Xl,#8
vst1.64 {$Xl},[$Xi] @ write out Xi
___
$code.=<<___ if ($flavour !~ /64/);
vldmia sp!,{d8-d15} @ 32-bit ABI says so
___
$code.=<<___;
ret
.size gcm_ghash_v8,.-gcm_ghash_v8
___
}
$code.=<<___;
.asciz "GHASH for ARMv8, CRYPTOGAMS by <appro\@openssl.org>"
.align 2
___
if ($flavour =~ /64/) { ######## 64-bit code
sub unvmov {
my $arg=shift;
$arg =~ m/q([0-9]+)#(lo|hi),\s*q([0-9]+)#(lo|hi)/o &&
sprintf "ins v%d.d[%d],v%d.d[%d]",$1,($2 eq "lo")?0:1,$3,($4 eq "lo")?0:1;
}
foreach(split("\n",$code)) {
s/cclr\s+([wx])([^,]+),\s*([a-z]+)/csel $1$2,$1zr,$1$2,$3/o or
s/vmov\.i8/movi/o or # fix up legacy mnemonics
s/vmov\s+(.*)/unvmov($1)/geo or
s/vext\.8/ext/o or
s/vshr\.s/sshr\.s/o or
s/vshr/ushr/o or
s/^(\s+)v/$1/o or # strip off v prefix
s/\bbx\s+lr\b/ret/o;
s/\bq([0-9]+)\b/"v".($1<8?$1:$1+8).".16b"/geo; # old->new registers
s/@\s/\/\//o; # old->new style commentary
# fix up remainig legacy suffixes
s/\.[ui]?8(\s)/$1/o;
s/\.[uis]?32//o and s/\.16b/\.4s/go;
m/\.p64/o and s/\.16b/\.1q/o; # 1st pmull argument
m/l\.p64/o and s/\.16b/\.1d/go; # 2nd and 3rd pmull arguments
s/\.[uisp]?64//o and s/\.16b/\.2d/go;
s/\.[42]([sd])\[([0-3])\]/\.$1\[$2\]/o;
print $_,"\n";
}
} else { ######## 32-bit code
sub unvdup32 {
my $arg=shift;
$arg =~ m/q([0-9]+),\s*q([0-9]+)\[([0-3])\]/o &&
sprintf "vdup.32 q%d,d%d[%d]",$1,2*$2+($3>>1),$3&1;
}
sub unvpmullp64 {
my ($mnemonic,$arg)=@_;
if ($arg =~ m/q([0-9]+),\s*q([0-9]+),\s*q([0-9]+)/o) {
my $word = 0xf2a00e00|(($1&7)<<13)|(($1&8)<<19)
|(($2&7)<<17)|(($2&8)<<4)
|(($3&7)<<1) |(($3&8)<<2);
$word |= 0x00010001 if ($mnemonic =~ "2");
# since ARMv7 instructions are always encoded little-endian.
# correct solution is to use .inst directive, but older
# assemblers don't implement it:-(
sprintf ".byte\t0x%02x,0x%02x,0x%02x,0x%02x\t@ %s %s",
$word&0xff,($word>>8)&0xff,
($word>>16)&0xff,($word>>24)&0xff,
$mnemonic,$arg;
}
}
foreach(split("\n",$code)) {
s/\b[wx]([0-9]+)\b/r$1/go; # new->old registers
s/\bv([0-9])\.[12468]+[bsd]\b/q$1/go; # new->old registers
s/\/\/\s?/@ /o; # new->old style commentary
# fix up remainig new-style suffixes
s/\],#[0-9]+/]!/o;
s/cclr\s+([^,]+),\s*([a-z]+)/mov$2 $1,#0/o or
s/vdup\.32\s+(.*)/unvdup32($1)/geo or
s/v?(pmull2?)\.p64\s+(.*)/unvpmullp64($1,$2)/geo or
s/\bq([0-9]+)#(lo|hi)/sprintf "d%d",2*$1+($2 eq "hi")/geo or
s/^(\s+)b\./$1b/o or
s/^(\s+)ret/$1bx\tlr/o;
print $_,"\n";
}
}
close STDOUT; # enforce flush

View file

@ -0,0 +1,60 @@
/*
BLAKE2 reference source code package - optimized C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#ifndef BLAKE2S_LOAD_SSE2_H
#define BLAKE2S_LOAD_SSE2_H
#define LOAD_MSG_0_1(buf) buf = _mm_set_epi32(m6,m4,m2,m0)
#define LOAD_MSG_0_2(buf) buf = _mm_set_epi32(m7,m5,m3,m1)
#define LOAD_MSG_0_3(buf) buf = _mm_set_epi32(m14,m12,m10,m8)
#define LOAD_MSG_0_4(buf) buf = _mm_set_epi32(m15,m13,m11,m9)
#define LOAD_MSG_1_1(buf) buf = _mm_set_epi32(m13,m9,m4,m14)
#define LOAD_MSG_1_2(buf) buf = _mm_set_epi32(m6,m15,m8,m10)
#define LOAD_MSG_1_3(buf) buf = _mm_set_epi32(m5,m11,m0,m1)
#define LOAD_MSG_1_4(buf) buf = _mm_set_epi32(m3,m7,m2,m12)
#define LOAD_MSG_2_1(buf) buf = _mm_set_epi32(m15,m5,m12,m11)
#define LOAD_MSG_2_2(buf) buf = _mm_set_epi32(m13,m2,m0,m8)
#define LOAD_MSG_2_3(buf) buf = _mm_set_epi32(m9,m7,m3,m10)
#define LOAD_MSG_2_4(buf) buf = _mm_set_epi32(m4,m1,m6,m14)
#define LOAD_MSG_3_1(buf) buf = _mm_set_epi32(m11,m13,m3,m7)
#define LOAD_MSG_3_2(buf) buf = _mm_set_epi32(m14,m12,m1,m9)
#define LOAD_MSG_3_3(buf) buf = _mm_set_epi32(m15,m4,m5,m2)
#define LOAD_MSG_3_4(buf) buf = _mm_set_epi32(m8,m0,m10,m6)
#define LOAD_MSG_4_1(buf) buf = _mm_set_epi32(m10,m2,m5,m9)
#define LOAD_MSG_4_2(buf) buf = _mm_set_epi32(m15,m4,m7,m0)
#define LOAD_MSG_4_3(buf) buf = _mm_set_epi32(m3,m6,m11,m14)
#define LOAD_MSG_4_4(buf) buf = _mm_set_epi32(m13,m8,m12,m1)
#define LOAD_MSG_5_1(buf) buf = _mm_set_epi32(m8,m0,m6,m2)
#define LOAD_MSG_5_2(buf) buf = _mm_set_epi32(m3,m11,m10,m12)
#define LOAD_MSG_5_3(buf) buf = _mm_set_epi32(m1,m15,m7,m4)
#define LOAD_MSG_5_4(buf) buf = _mm_set_epi32(m9,m14,m5,m13)
#define LOAD_MSG_6_1(buf) buf = _mm_set_epi32(m4,m14,m1,m12)
#define LOAD_MSG_6_2(buf) buf = _mm_set_epi32(m10,m13,m15,m5)
#define LOAD_MSG_6_3(buf) buf = _mm_set_epi32(m8,m9,m6,m0)
#define LOAD_MSG_6_4(buf) buf = _mm_set_epi32(m11,m2,m3,m7)
#define LOAD_MSG_7_1(buf) buf = _mm_set_epi32(m3,m12,m7,m13)
#define LOAD_MSG_7_2(buf) buf = _mm_set_epi32(m9,m1,m14,m11)
#define LOAD_MSG_7_3(buf) buf = _mm_set_epi32(m2,m8,m15,m5)
#define LOAD_MSG_7_4(buf) buf = _mm_set_epi32(m10,m6,m4,m0)
#define LOAD_MSG_8_1(buf) buf = _mm_set_epi32(m0,m11,m14,m6)
#define LOAD_MSG_8_2(buf) buf = _mm_set_epi32(m8,m3,m9,m15)
#define LOAD_MSG_8_3(buf) buf = _mm_set_epi32(m10,m1,m13,m12)
#define LOAD_MSG_8_4(buf) buf = _mm_set_epi32(m5,m4,m7,m2)
#define LOAD_MSG_9_1(buf) buf = _mm_set_epi32(m1,m7,m8,m10)
#define LOAD_MSG_9_2(buf) buf = _mm_set_epi32(m5,m6,m4,m2)
#define LOAD_MSG_9_3(buf) buf = _mm_set_epi32(m13,m3,m9,m15)
#define LOAD_MSG_9_4(buf) buf = _mm_set_epi32(m0,m12,m14,m11)
#endif

229
crypto/blake2s-load-sse41.h Normal file
View file

@ -0,0 +1,229 @@
/*
BLAKE2 reference source code package - optimized C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#ifndef BLAKE2S_LOAD_SSE41_H
#define BLAKE2S_LOAD_SSE41_H
#define LOAD_MSG_0_1(buf) \
buf = TOI(_mm_shuffle_ps(TOF(m0), TOF(m1), _MM_SHUFFLE(2,0,2,0)));
#define LOAD_MSG_0_2(buf) \
buf = TOI(_mm_shuffle_ps(TOF(m0), TOF(m1), _MM_SHUFFLE(3,1,3,1)));
#define LOAD_MSG_0_3(buf) \
buf = TOI(_mm_shuffle_ps(TOF(m2), TOF(m3), _MM_SHUFFLE(2,0,2,0)));
#define LOAD_MSG_0_4(buf) \
buf = TOI(_mm_shuffle_ps(TOF(m2), TOF(m3), _MM_SHUFFLE(3,1,3,1)));
#define LOAD_MSG_1_1(buf) \
t0 = _mm_blend_epi16(m1, m2, 0x0C); \
t1 = _mm_slli_si128(m3, 4); \
t2 = _mm_blend_epi16(t0, t1, 0xF0); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,1,0,3));
#define LOAD_MSG_1_2(buf) \
t0 = _mm_shuffle_epi32(m2,_MM_SHUFFLE(0,0,2,0)); \
t1 = _mm_blend_epi16(m1,m3,0xC0); \
t2 = _mm_blend_epi16(t0, t1, 0xF0); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,3,0,1));
#define LOAD_MSG_1_3(buf) \
t0 = _mm_slli_si128(m1, 4); \
t1 = _mm_blend_epi16(m2, t0, 0x30); \
t2 = _mm_blend_epi16(m0, t1, 0xF0); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,3,0,1));
#define LOAD_MSG_1_4(buf) \
t0 = _mm_unpackhi_epi32(m0,m1); \
t1 = _mm_slli_si128(m3, 4); \
t2 = _mm_blend_epi16(t0, t1, 0x0C); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,3,0,1));
#define LOAD_MSG_2_1(buf) \
t0 = _mm_unpackhi_epi32(m2,m3); \
t1 = _mm_blend_epi16(m3,m1,0x0C); \
t2 = _mm_blend_epi16(t0, t1, 0x0F); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(3,1,0,2));
#define LOAD_MSG_2_2(buf) \
t0 = _mm_unpacklo_epi32(m2,m0); \
t1 = _mm_blend_epi16(t0, m0, 0xF0); \
t2 = _mm_slli_si128(m3, 8); \
buf = _mm_blend_epi16(t1, t2, 0xC0);
#define LOAD_MSG_2_3(buf) \
t0 = _mm_blend_epi16(m0, m2, 0x3C); \
t1 = _mm_srli_si128(m1, 12); \
t2 = _mm_blend_epi16(t0,t1,0x03); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(1,0,3,2));
#define LOAD_MSG_2_4(buf) \
t0 = _mm_slli_si128(m3, 4); \
t1 = _mm_blend_epi16(m0, m1, 0x33); \
t2 = _mm_blend_epi16(t1, t0, 0xC0); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(0,1,2,3));
#define LOAD_MSG_3_1(buf) \
t0 = _mm_unpackhi_epi32(m0,m1); \
t1 = _mm_unpackhi_epi32(t0, m2); \
t2 = _mm_blend_epi16(t1, m3, 0x0C); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(3,1,0,2));
#define LOAD_MSG_3_2(buf) \
t0 = _mm_slli_si128(m2, 8); \
t1 = _mm_blend_epi16(m3,m0,0x0C); \
t2 = _mm_blend_epi16(t1, t0, 0xC0); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,0,1,3));
#define LOAD_MSG_3_3(buf) \
t0 = _mm_blend_epi16(m0,m1,0x0F); \
t1 = _mm_blend_epi16(t0, m3, 0xC0); \
buf = _mm_shuffle_epi32(t1, _MM_SHUFFLE(3,0,1,2));
#define LOAD_MSG_3_4(buf) \
t0 = _mm_unpacklo_epi32(m0,m2); \
t1 = _mm_unpackhi_epi32(m1,m2); \
buf = _mm_unpacklo_epi64(t1,t0);
#define LOAD_MSG_4_1(buf) \
t0 = _mm_unpacklo_epi64(m1,m2); \
t1 = _mm_unpackhi_epi64(m0,m2); \
t2 = _mm_blend_epi16(t0,t1,0x33); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,0,1,3));
#define LOAD_MSG_4_2(buf) \
t0 = _mm_unpackhi_epi64(m1,m3); \
t1 = _mm_unpacklo_epi64(m0,m1); \
buf = _mm_blend_epi16(t0,t1,0x33);
#define LOAD_MSG_4_3(buf) \
t0 = _mm_unpackhi_epi64(m3,m1); \
t1 = _mm_unpackhi_epi64(m2,m0); \
buf = _mm_blend_epi16(t1,t0,0x33);
#define LOAD_MSG_4_4(buf) \
t0 = _mm_blend_epi16(m0,m2,0x03); \
t1 = _mm_slli_si128(t0, 8); \
t2 = _mm_blend_epi16(t1,m3,0x0F); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(1,2,0,3));
#define LOAD_MSG_5_1(buf) \
t0 = _mm_unpackhi_epi32(m0,m1); \
t1 = _mm_unpacklo_epi32(m0,m2); \
buf = _mm_unpacklo_epi64(t0,t1);
#define LOAD_MSG_5_2(buf) \
t0 = _mm_srli_si128(m2, 4); \
t1 = _mm_blend_epi16(m0,m3,0x03); \
buf = _mm_blend_epi16(t1,t0,0x3C);
#define LOAD_MSG_5_3(buf) \
t0 = _mm_blend_epi16(m1,m0,0x0C); \
t1 = _mm_srli_si128(m3, 4); \
t2 = _mm_blend_epi16(t0,t1,0x30); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(1,2,3,0));
#define LOAD_MSG_5_4(buf) \
t0 = _mm_unpacklo_epi64(m1,m2); \
t1= _mm_shuffle_epi32(m3, _MM_SHUFFLE(0,2,0,1)); \
buf = _mm_blend_epi16(t0,t1,0x33);
#define LOAD_MSG_6_1(buf) \
t0 = _mm_slli_si128(m1, 12); \
t1 = _mm_blend_epi16(m0,m3,0x33); \
buf = _mm_blend_epi16(t1,t0,0xC0);
#define LOAD_MSG_6_2(buf) \
t0 = _mm_blend_epi16(m3,m2,0x30); \
t1 = _mm_srli_si128(m1, 4); \
t2 = _mm_blend_epi16(t0,t1,0x03); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(2,1,3,0));
#define LOAD_MSG_6_3(buf) \
t0 = _mm_unpacklo_epi64(m0,m2); \
t1 = _mm_srli_si128(m1, 4); \
buf = _mm_shuffle_epi32(_mm_blend_epi16(t0,t1,0x0C), _MM_SHUFFLE(2,3,1,0));
#define LOAD_MSG_6_4(buf) \
t0 = _mm_unpackhi_epi32(m1,m2); \
t1 = _mm_unpackhi_epi64(m0,t0); \
buf = _mm_shuffle_epi32(t1, _MM_SHUFFLE(3,0,1,2));
#define LOAD_MSG_7_1(buf) \
t0 = _mm_unpackhi_epi32(m0,m1); \
t1 = _mm_blend_epi16(t0,m3,0x0F); \
buf = _mm_shuffle_epi32(t1,_MM_SHUFFLE(2,0,3,1));
#define LOAD_MSG_7_2(buf) \
t0 = _mm_blend_epi16(m2,m3,0x30); \
t1 = _mm_srli_si128(m0,4); \
t2 = _mm_blend_epi16(t0,t1,0x03); \
buf = _mm_shuffle_epi32(t2, _MM_SHUFFLE(1,0,2,3));
#define LOAD_MSG_7_3(buf) \
t0 = _mm_unpackhi_epi64(m0,m3); \
t1 = _mm_unpacklo_epi64(m1,m2); \
t2 = _mm_blend_epi16(t0,t1,0x3C); \
buf = _mm_shuffle_epi32(t2,_MM_SHUFFLE(0,2,3,1));
#define LOAD_MSG_7_4(buf) \
t0 = _mm_unpacklo_epi32(m0,m1); \
t1 = _mm_unpackhi_epi32(m1,m2); \
buf = _mm_unpacklo_epi64(t0,t1);
#define LOAD_MSG_8_1(buf) \
t0 = _mm_unpackhi_epi32(m1,m3); \
t1 = _mm_unpacklo_epi64(t0,m0); \
t2 = _mm_blend_epi16(t1,m2,0xC0); \
buf = _mm_shufflehi_epi16(t2,_MM_SHUFFLE(1,0,3,2));
#define LOAD_MSG_8_2(buf) \
t0 = _mm_unpackhi_epi32(m0,m3); \
t1 = _mm_blend_epi16(m2,t0,0xF0); \
buf = _mm_shuffle_epi32(t1,_MM_SHUFFLE(0,2,1,3));
#define LOAD_MSG_8_3(buf) \
t0 = _mm_blend_epi16(m2,m0,0x0C); \
t1 = _mm_slli_si128(t0,4); \
buf = _mm_blend_epi16(t1,m3,0x0F);
#define LOAD_MSG_8_4(buf) \
t0 = _mm_blend_epi16(m1,m0,0x30); \
buf = _mm_shuffle_epi32(t0,_MM_SHUFFLE(1,0,3,2));
#define LOAD_MSG_9_1(buf) \
t0 = _mm_blend_epi16(m0,m2,0x03); \
t1 = _mm_blend_epi16(m1,m2,0x30); \
t2 = _mm_blend_epi16(t1,t0,0x0F); \
buf = _mm_shuffle_epi32(t2,_MM_SHUFFLE(1,3,0,2));
#define LOAD_MSG_9_2(buf) \
t0 = _mm_slli_si128(m0,4); \
t1 = _mm_blend_epi16(m1,t0,0xC0); \
buf = _mm_shuffle_epi32(t1,_MM_SHUFFLE(1,2,0,3));
#define LOAD_MSG_9_3(buf) \
t0 = _mm_unpackhi_epi32(m0,m3); \
t1 = _mm_unpacklo_epi32(m2,m3); \
t2 = _mm_unpackhi_epi64(t0,t1); \
buf = _mm_shuffle_epi32(t2,_MM_SHUFFLE(3,0,2,1));
#define LOAD_MSG_9_4(buf) \
t0 = _mm_blend_epi16(m3,m2,0xC0); \
t1 = _mm_unpacklo_epi32(m0,m3); \
t2 = _mm_blend_epi16(t0,t1,0x0F); \
buf = _mm_shuffle_epi32(t2,_MM_SHUFFLE(0,1,2,3));
#endif

191
crypto/blake2s-load-xop.h Normal file
View file

@ -0,0 +1,191 @@
/*
BLAKE2 reference source code package - optimized C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#ifndef BLAKE2S_LOAD_XOP_H
#define BLAKE2S_LOAD_XOP_H
#define TOB(x) ((x)*4*0x01010101 + 0x03020100) /* ..or not TOB */
#if 0
/* Basic VPPERM emulation, for testing purposes */
static __m128i _mm_perm_epi8(const __m128i src1, const __m128i src2, const __m128i sel)
{
const __m128i sixteen = _mm_set1_epi8(16);
const __m128i t0 = _mm_shuffle_epi8(src1, sel);
const __m128i s1 = _mm_shuffle_epi8(src2, _mm_sub_epi8(sel, sixteen));
const __m128i mask = _mm_or_si128(_mm_cmpeq_epi8(sel, sixteen),
_mm_cmpgt_epi8(sel, sixteen)); /* (>=16) = 0xff : 00 */
return _mm_blendv_epi8(t0, s1, mask);
}
#endif
#define LOAD_MSG_0_1(buf) \
buf = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) );
#define LOAD_MSG_0_2(buf) \
buf = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(7),TOB(5),TOB(3),TOB(1)) );
#define LOAD_MSG_0_3(buf) \
buf = _mm_perm_epi8(m2, m3, _mm_set_epi32(TOB(6),TOB(4),TOB(2),TOB(0)) );
#define LOAD_MSG_0_4(buf) \
buf = _mm_perm_epi8(m2, m3, _mm_set_epi32(TOB(7),TOB(5),TOB(3),TOB(1)) );
#define LOAD_MSG_1_1(buf) \
t0 = _mm_perm_epi8(m1, m2, _mm_set_epi32(TOB(0),TOB(5),TOB(0),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(5),TOB(2),TOB(1),TOB(6)) );
#define LOAD_MSG_1_2(buf) \
t1 = _mm_perm_epi8(m1, m2, _mm_set_epi32(TOB(2),TOB(0),TOB(4),TOB(6)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(7),TOB(1),TOB(0)) );
#define LOAD_MSG_1_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(5),TOB(0),TOB(0),TOB(1)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(3),TOB(7),TOB(1),TOB(0)) );
#define LOAD_MSG_1_4(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(3),TOB(7),TOB(2),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(1),TOB(4)) );
#define LOAD_MSG_2_1(buf) \
t0 = _mm_perm_epi8(m1, m2, _mm_set_epi32(TOB(0),TOB(1),TOB(0),TOB(7)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(7),TOB(2),TOB(4),TOB(0)) );
#define LOAD_MSG_2_2(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(0),TOB(2),TOB(0),TOB(4)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(5),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_2_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(7),TOB(3),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(5),TOB(2),TOB(1),TOB(6)) );
#define LOAD_MSG_2_4(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(4),TOB(1),TOB(6),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(1),TOB(6)) );
#define LOAD_MSG_3_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(0),TOB(3),TOB(7)) ); \
t0 = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(7),TOB(2),TOB(1),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(5),TOB(1),TOB(0)) );
#define LOAD_MSG_3_2(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(0),TOB(0),TOB(1),TOB(5)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(6),TOB(4),TOB(1),TOB(0)) );
#define LOAD_MSG_3_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(4),TOB(5),TOB(2)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(7),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_3_4(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(0),TOB(0),TOB(6)) ); \
buf = _mm_perm_epi8(t1, m2, _mm_set_epi32(TOB(4),TOB(2),TOB(6),TOB(0)) );
#define LOAD_MSG_4_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(2),TOB(5),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(6),TOB(2),TOB(1),TOB(5)) );
#define LOAD_MSG_4_2(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(4),TOB(7),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(7),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_4_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(3),TOB(6),TOB(0),TOB(0)) ); \
t0 = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(3),TOB(2),TOB(7),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(1),TOB(6)) );
#define LOAD_MSG_4_4(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(0),TOB(4),TOB(0),TOB(1)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(5),TOB(2),TOB(4),TOB(0)) );
#define LOAD_MSG_5_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(0),TOB(6),TOB(2)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(4),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_5_2(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(3),TOB(7),TOB(6),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(1),TOB(4)) );
#define LOAD_MSG_5_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(1),TOB(0),TOB(7),TOB(4)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(7),TOB(1),TOB(0)) );
#define LOAD_MSG_5_4(buf) \
t1 = _mm_perm_epi8(m1, m2, _mm_set_epi32(TOB(5),TOB(0),TOB(1),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(6),TOB(1),TOB(5)) );
#define LOAD_MSG_6_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(4),TOB(0),TOB(1),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(6),TOB(1),TOB(4)) );
#define LOAD_MSG_6_2(buf) \
t1 = _mm_perm_epi8(m1, m2, _mm_set_epi32(TOB(6),TOB(0),TOB(0),TOB(1)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(5),TOB(7),TOB(0)) );
#define LOAD_MSG_6_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(0),TOB(6),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(4),TOB(5),TOB(1),TOB(0)) );
#define LOAD_MSG_6_4(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(2),TOB(3),TOB(7)) ); \
buf = _mm_perm_epi8(t1, m2, _mm_set_epi32(TOB(7),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_7_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(3),TOB(0),TOB(7),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(4),TOB(1),TOB(5)) );
#define LOAD_MSG_7_2(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(5),TOB(1),TOB(0),TOB(7)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(6),TOB(0)) );
#define LOAD_MSG_7_3(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(2),TOB(0),TOB(0),TOB(5)) ); \
t0 = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(3),TOB(4),TOB(1),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(7),TOB(0)) );
#define LOAD_MSG_7_4(buf) \
t1 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(6),TOB(4),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m2, _mm_set_epi32(TOB(6),TOB(2),TOB(1),TOB(0)) );
#define LOAD_MSG_8_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(0),TOB(0),TOB(0),TOB(6)) ); \
t0 = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(3),TOB(7),TOB(1),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(6),TOB(0)) );
#define LOAD_MSG_8_2(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(4),TOB(3),TOB(5),TOB(0)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(1),TOB(7)) );
#define LOAD_MSG_8_3(buf) \
t0 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(6),TOB(1),TOB(0),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(3),TOB(2),TOB(5),TOB(4)) ); \
#define LOAD_MSG_8_4(buf) \
buf = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(5),TOB(4),TOB(7),TOB(2)) );
#define LOAD_MSG_9_1(buf) \
t0 = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(1),TOB(7),TOB(0),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m2, _mm_set_epi32(TOB(3),TOB(2),TOB(4),TOB(6)) );
#define LOAD_MSG_9_2(buf) \
buf = _mm_perm_epi8(m0, m1, _mm_set_epi32(TOB(5),TOB(6),TOB(4),TOB(2)) );
#define LOAD_MSG_9_3(buf) \
t0 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(0),TOB(3),TOB(5),TOB(0)) ); \
buf = _mm_perm_epi8(t0, m3, _mm_set_epi32(TOB(5),TOB(2),TOB(1),TOB(7)) );
#define LOAD_MSG_9_4(buf) \
t1 = _mm_perm_epi8(m0, m2, _mm_set_epi32(TOB(0),TOB(0),TOB(0),TOB(7)) ); \
buf = _mm_perm_epi8(t1, m3, _mm_set_epi32(TOB(3),TOB(4),TOB(6),TOB(0)) );
#endif

88
crypto/blake2s-round.h Normal file
View file

@ -0,0 +1,88 @@
/*
BLAKE2 reference source code package - optimized C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#ifndef BLAKE2S_ROUND_H
#define BLAKE2S_ROUND_H
#define LOADU(p) _mm_loadu_si128( (const __m128i *)(p) )
#define STOREU(p,r) _mm_storeu_si128((__m128i *)(p), r)
#define TOF(reg) _mm_castsi128_ps((reg))
#define TOI(reg) _mm_castps_si128((reg))
#define LIKELY(x) __builtin_expect((x),1)
/* Microarchitecture-specific macros */
#ifndef HAVE_XOP
#ifdef HAVE_SSSE3
#define _mm_roti_epi32(r, c) ( \
(8==-(c)) ? _mm_shuffle_epi8(r,r8) \
: (16==-(c)) ? _mm_shuffle_epi8(r,r16) \
: _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-(c)) )) )
#else
#define _mm_roti_epi32(r, c) _mm_xor_si128(_mm_srli_epi32( (r), -(c) ),_mm_slli_epi32( (r), 32-(-(c)) ))
#endif
#else
/* ... */
#endif
#define G1(row1,row2,row3,row4,buf) \
row1 = _mm_add_epi32( _mm_add_epi32( row1, buf), row2 ); \
row4 = _mm_xor_si128( row4, row1 ); \
row4 = _mm_roti_epi32(row4, -16); \
row3 = _mm_add_epi32( row3, row4 ); \
row2 = _mm_xor_si128( row2, row3 ); \
row2 = _mm_roti_epi32(row2, -12);
#define G2(row1,row2,row3,row4,buf) \
row1 = _mm_add_epi32( _mm_add_epi32( row1, buf), row2 ); \
row4 = _mm_xor_si128( row4, row1 ); \
row4 = _mm_roti_epi32(row4, -8); \
row3 = _mm_add_epi32( row3, row4 ); \
row2 = _mm_xor_si128( row2, row3 ); \
row2 = _mm_roti_epi32(row2, -7);
#define DIAGONALIZE(row1,row2,row3,row4) \
row4 = _mm_shuffle_epi32( row4, _MM_SHUFFLE(2,1,0,3) ); \
row3 = _mm_shuffle_epi32( row3, _MM_SHUFFLE(1,0,3,2) ); \
row2 = _mm_shuffle_epi32( row2, _MM_SHUFFLE(0,3,2,1) );
#define UNDIAGONALIZE(row1,row2,row3,row4) \
row4 = _mm_shuffle_epi32( row4, _MM_SHUFFLE(0,3,2,1) ); \
row3 = _mm_shuffle_epi32( row3, _MM_SHUFFLE(1,0,3,2) ); \
row2 = _mm_shuffle_epi32( row2, _MM_SHUFFLE(2,1,0,3) );
#if defined(HAVE_XOP)
#include "blake2s-load-xop.h"
#elif defined(HAVE_SSE41)
#include "blake2s-load-sse41.h"
#else
#include "blake2s-load-sse2.h"
#endif
#define ROUND(r) \
LOAD_MSG_ ##r ##_1(buf1); \
G1(row1,row2,row3,row4,buf1); \
LOAD_MSG_ ##r ##_2(buf2); \
G2(row1,row2,row3,row4,buf2); \
DIAGONALIZE(row1,row2,row3,row4); \
LOAD_MSG_ ##r ##_3(buf3); \
G1(row1,row2,row3,row4,buf3); \
LOAD_MSG_ ##r ##_4(buf4); \
G2(row1,row2,row3,row4,buf4); \
UNDIAGONALIZE(row1,row2,row3,row4); \
#endif

446
crypto/blake2s.cpp Normal file
View file

@ -0,0 +1,446 @@
/*
BLAKE2 reference source code package - reference C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#include "stdafx.h"
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <assert.h>
#include "tunsafe_types.h"
#include "blake2s.h"
#include "crypto_ops.h"
void blake2s_compress_sse(blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]);
#if !defined(__cplusplus) && (!defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L)
#if defined(_MSC_VER)
#define BLAKE2_INLINE __inline
#elif defined(__GNUC__)
#define BLAKE2_INLINE __inline__
#else
#define BLAKE2_INLINE
#endif
#else
#define BLAKE2_INLINE inline
#endif
static BLAKE2_INLINE uint32_t load32(const void *src) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
uint32_t w;
memcpy(&w, src, sizeof w);
return w;
#else
const uint8_t *p = (const uint8_t *)src;
return ((uint32_t)(p[0]) << 0) |
((uint32_t)(p[1]) << 8) |
((uint32_t)(p[2]) << 16) |
((uint32_t)(p[3]) << 24);
#endif
}
static BLAKE2_INLINE uint16_t load16(const void *src) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
uint16_t w;
memcpy(&w, src, sizeof w);
return w;
#else
const uint8_t *p = (const uint8_t *)src;
return ((uint16_t)(p[0]) << 0) |
((uint16_t)(p[1]) << 8);
#endif
}
static BLAKE2_INLINE void store16(void *dst, uint16_t w) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
memcpy(dst, &w, sizeof w);
#else
uint8_t *p = (uint8_t *)dst;
*p++ = (uint8_t)w; w >>= 8;
*p++ = (uint8_t)w;
#endif
}
static BLAKE2_INLINE void store32(void *dst, uint32_t w) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
memcpy(dst, &w, sizeof w);
#else
uint8_t *p = (uint8_t *)dst;
p[0] = (uint8_t)(w >> 0);
p[1] = (uint8_t)(w >> 8);
p[2] = (uint8_t)(w >> 16);
p[3] = (uint8_t)(w >> 24);
#endif
}
static BLAKE2_INLINE uint32_t rotr32(const uint32_t w, const unsigned c) {
return (w >> c) | (w << (32 - c));
}
static BLAKE2_INLINE uint64_t rotr64(const uint64_t w, const unsigned c) {
return (w >> c) | (w << (64 - c));
}
static const uint32_t blake2s_IV[8] = {
0x6A09E667UL, 0xBB67AE85UL, 0x3C6EF372UL, 0xA54FF53AUL,
0x510E527FUL, 0x9B05688CUL, 0x1F83D9ABUL, 0x5BE0CD19UL
};
static const uint8_t blake2s_sigma[10][16] =
{
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} ,
{14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3} ,
{11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4} ,
{7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8} ,
{9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13} ,
{2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9} ,
{12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11} ,
{13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10} ,
{6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5} ,
{10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13 , 0} ,
};
static void blake2s_set_lastnode(blake2s_state *S) {
S->f[1] = (uint32_t)-1;
}
/* Some helper functions, not necessarily useful */
static int blake2s_is_lastblock(const blake2s_state *S) {
return S->f[0] != 0;
}
static void blake2s_set_lastblock(blake2s_state *S) {
if (S->last_node) blake2s_set_lastnode(S);
S->f[0] = (uint32_t)-1;
}
static void blake2s_increment_counter(blake2s_state *S, const uint32_t inc) {
S->t[0] += inc;
S->t[1] += (S->t[0] < inc);
}
void blake2s_init_with_len(blake2s_state *S, size_t outlen, size_t keylen) {
memset(S, 0, sizeof(blake2s_state));
blake2s_param *P = &S->param;
size_t i;
/* Move interval verification here? */
assert(outlen && outlen <= BLAKE2S_OUTBYTES);
P->digest_length = (uint8_t)outlen;
S->outlen = (uint8_t)outlen;
P->key_length = (uint8_t)keylen;
P->fanout = 1;
P->depth = 1;
// store32(&P.leaf_length, 0);
// store32(&P.node_offset, 0);
// store16(&P.xof_length, 0);
// P.node_depth = 0;
// P.inner_length = 0;
/* memset(P->reserved, 0, sizeof(P->reserved) ); */
// memset(P.salt, 0, sizeof(P.salt));
// memset(P.personal, 0, sizeof(P.personal));
for (i = 0; i < 8; ++i)
S->h[i] = load32(&S->h[i]) ^ blake2s_IV[i];
}
/* Sequential blake2s initialization */
void blake2s_init(blake2s_state *S, size_t outlen) {
blake2s_init_with_len(S, outlen, 0);
}
void blake2s_init_key(blake2s_state *S, size_t outlen, const void *key, size_t keylen) {
uint8_t block[BLAKE2S_BLOCKBYTES];
assert(outlen && outlen <= BLAKE2S_OUTBYTES);
assert(key && keylen && keylen <= BLAKE2S_KEYBYTES);
blake2s_init_with_len(S, outlen, keylen);
memset(block, 0, BLAKE2S_BLOCKBYTES);
memcpy(block, key, keylen);
blake2s_update(S, block, BLAKE2S_BLOCKBYTES);
memzero_crypto(block, BLAKE2S_BLOCKBYTES); /* Burn the key from stack */
}
#define G(r,i,a,b,c,d) \
do { \
a = a + b + m[blake2s_sigma[r][2*i+0]]; \
d = rotr32(d ^ a, 16); \
c = c + d; \
b = rotr32(b ^ c, 12); \
a = a + b + m[blake2s_sigma[r][2*i+1]]; \
d = rotr32(d ^ a, 8); \
c = c + d; \
b = rotr32(b ^ c, 7); \
} while(0)
#define ROUND(r) \
do { \
G(r,0,v[ 0],v[ 4],v[ 8],v[12]); \
G(r,1,v[ 1],v[ 5],v[ 9],v[13]); \
G(r,2,v[ 2],v[ 6],v[10],v[14]); \
G(r,3,v[ 3],v[ 7],v[11],v[15]); \
G(r,4,v[ 0],v[ 5],v[10],v[15]); \
G(r,5,v[ 1],v[ 6],v[11],v[12]); \
G(r,6,v[ 2],v[ 7],v[ 8],v[13]); \
G(r,7,v[ 3],v[ 4],v[ 9],v[14]); \
} while(0)
static void blake2s_compress(blake2s_state *S, const uint8_t in[BLAKE2S_BLOCKBYTES]) {
uint32_t m[16];
uint32_t v[16];
size_t i;
for (i = 0; i < 16; ++i) {
m[i] = load32(in + i * sizeof(m[i]));
}
for (i = 0; i < 8; ++i) {
v[i] = S->h[i];
}
v[8] = blake2s_IV[0];
v[9] = blake2s_IV[1];
v[10] = blake2s_IV[2];
v[11] = blake2s_IV[3];
v[12] = S->t[0] ^ blake2s_IV[4];
v[13] = S->t[1] ^ blake2s_IV[5];
v[14] = S->f[0] ^ blake2s_IV[6];
v[15] = S->f[1] ^ blake2s_IV[7];
ROUND(0);
ROUND(1);
ROUND(2);
ROUND(3);
ROUND(4);
ROUND(5);
ROUND(6);
ROUND(7);
ROUND(8);
ROUND(9);
for (i = 0; i < 8; ++i) {
S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
}
}
#undef G
#undef ROUND
static inline void blake2s_compress_impl(blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES]) {
#if defined(ARCH_CPU_X86_64)
blake2s_compress_sse(S, block);
#else
blake2s_compress(S, block);
#endif
}
void blake2s_update(blake2s_state *S, const void *pin, size_t inlen) {
const unsigned char * in = (const unsigned char *)pin;
if (inlen > 0) {
size_t left = S->buflen;
size_t fill = BLAKE2S_BLOCKBYTES - left;
if (inlen > fill) {
S->buflen = 0;
memcpy(S->buf + left, in, fill); /* Fill buffer */
blake2s_increment_counter(S, BLAKE2S_BLOCKBYTES);
blake2s_compress_impl(S, S->buf); /* Compress */
in += fill; inlen -= fill;
while (inlen > BLAKE2S_BLOCKBYTES) {
blake2s_increment_counter(S, BLAKE2S_BLOCKBYTES);
blake2s_compress_impl(S, in);
in += BLAKE2S_BLOCKBYTES;
inlen -= BLAKE2S_BLOCKBYTES;
}
}
memcpy(S->buf + S->buflen, in, inlen);
S->buflen += inlen;
}
}
void blake2s_final(blake2s_state *S, void *out, size_t outlen) {
size_t i;
assert(out != NULL && outlen >= S->outlen);
assert(!blake2s_is_lastblock(S));
blake2s_increment_counter(S, (uint32_t)S->buflen);
blake2s_set_lastblock(S);
memset(S->buf + S->buflen, 0, BLAKE2S_BLOCKBYTES - S->buflen); /* Padding */
blake2s_compress_impl(S, S->buf);
for (i = 0; i < 8; ++i) /* Output full hash to temp buffer */
store32(&S->h[i], S->h[i]);
memcpy(out, S->h, outlen);
}
SAFEBUFFERS void blake2s(void *out, size_t outlen, const void *in, size_t inlen, const void *key, size_t keylen) {
blake2s_state S;
/* Verify parameters */
assert(!((NULL == in && inlen > 0)));
assert(out);
assert(!(NULL == key && keylen > 0));
assert(!(!outlen || outlen > BLAKE2S_OUTBYTES));
assert(!(keylen > BLAKE2S_KEYBYTES));
if (keylen > 0) {
blake2s_init_key(&S, outlen, key, keylen);
} else {
blake2s_init(&S, outlen);
}
blake2s_update(&S, (const uint8_t *)in, inlen);
blake2s_final(&S, out, outlen);
}
SAFEBUFFERS void blake2s_hmac(uint8_t *out, size_t outlen, const uint8_t *in, size_t inlen, const uint8_t *key, size_t keylen) {
blake2s_state b2s;
uint64_t temp[BLAKE2S_OUTBYTES / 8];
uint64_t key_temp[BLAKE2S_BLOCKBYTES / 8] = { 0 };
if (keylen > BLAKE2S_BLOCKBYTES) {
blake2s_init(&b2s, BLAKE2S_OUTBYTES);
blake2s_update(&b2s, key, keylen);
blake2s_final(&b2s, key_temp, BLAKE2S_OUTBYTES);
} else {
memcpy(key_temp, key, keylen);
}
for (size_t i = 0; i < BLAKE2S_BLOCKBYTES / 8; i++)
key_temp[i] ^= 0x3636363636363636ull;
blake2s_init(&b2s, BLAKE2S_OUTBYTES);
blake2s_update(&b2s, key_temp, BLAKE2S_BLOCKBYTES);
blake2s_update(&b2s, in, inlen);
blake2s_final(&b2s, temp, BLAKE2S_OUTBYTES);
for (size_t i = 0; i < BLAKE2S_BLOCKBYTES / 8; i++)
key_temp[i] ^= 0x5c5c5c5c5c5c5c5cull ^ 0x3636363636363636ull;
blake2s_init(&b2s, BLAKE2S_OUTBYTES);
blake2s_update(&b2s, key_temp, BLAKE2S_BLOCKBYTES);
blake2s_update(&b2s, temp, BLAKE2S_OUTBYTES);
blake2s_final(&b2s, temp, BLAKE2S_OUTBYTES);
memcpy(out, temp, outlen);
memzero_crypto(key_temp, sizeof(key_temp));
memzero_crypto(temp, sizeof(temp));
}
SAFEBUFFERS
void blake2s_hkdf(uint8 *dst1, size_t dst1_size,
uint8 *dst2, size_t dst2_size,
uint8 *dst3, size_t dst3_size,
const uint8 *data, size_t data_size,
const uint8 *key, size_t key_size) {
struct {
uint8 prk[BLAKE2S_OUTBYTES];
uint8 temp[BLAKE2S_OUTBYTES + 1];
} t;
blake2s_hmac(t.prk, BLAKE2S_OUTBYTES, data, data_size, key, key_size);
// first-key = HMAC(secret, 0x1)
t.temp[0] = 0x1;
blake2s_hmac(t.temp, BLAKE2S_OUTBYTES, t.temp, 1, t.prk, BLAKE2S_OUTBYTES);
memcpy(dst1, t.temp, dst1_size);
if (dst2 != NULL) {
// second-key = HMAC(secret, first-key || 0x2)
t.temp[BLAKE2S_OUTBYTES] = 0x2;
blake2s_hmac(t.temp, BLAKE2S_OUTBYTES, t.temp, BLAKE2S_OUTBYTES + 1, t.prk, BLAKE2S_OUTBYTES);
memcpy(dst2, t.temp, dst2_size);
if (dst3 != NULL) {
// third-key = HMAC(secret, second-key || 0x3)
t.temp[BLAKE2S_OUTBYTES] = 0x3;
blake2s_hmac(t.temp, BLAKE2S_OUTBYTES, t.temp, BLAKE2S_OUTBYTES + 1, t.prk, BLAKE2S_OUTBYTES);
memcpy(dst3, t.temp, dst3_size);
}
}
memzero_crypto(&t, sizeof(t));
}
#if defined(SUPERCOP)
int crypto_hash(unsigned char *out, unsigned char *in, unsigned long long inlen) {
return blake2s(out, BLAKE2S_OUTBYTES in, inlen, NULL, 0);
}
#endif
#if defined(BLAKE2S_SELFTEST)
#include <string.h>
#include "blake2-kat.h"
int main(void) {
uint8_t key[BLAKE2S_KEYBYTES];
uint8_t buf[BLAKE2_KAT_LENGTH];
size_t i, step;
for (i = 0; i < BLAKE2S_KEYBYTES; ++i)
key[i] = (uint8_t)i;
for (i = 0; i < BLAKE2_KAT_LENGTH; ++i)
buf[i] = (uint8_t)i;
/* Test simple API */
for (i = 0; i < BLAKE2_KAT_LENGTH; ++i) {
uint8_t hash[BLAKE2S_OUTBYTES];
blake2s(hash, BLAKE2S_OUTBYTES, buf, i, key, BLAKE2S_KEYBYTES);
if (0 != memcmp(hash, blake2s_keyed_kat[i], BLAKE2S_OUTBYTES)) {
goto fail;
}
}
/* Test streaming API */
for (step = 1; step < BLAKE2S_BLOCKBYTES; ++step) {
for (i = 0; i < BLAKE2_KAT_LENGTH; ++i) {
uint8_t hash[BLAKE2S_OUTBYTES];
blake2s_state S;
uint8_t * p = buf;
size_t mlen = i;
int err = 0;
if ((err = blake2s_init_key(&S, BLAKE2S_OUTBYTES, key, BLAKE2S_KEYBYTES)) < 0) {
goto fail;
}
while (mlen >= step) {
if ((err = blake2s_update(&S, p, step)) < 0) {
goto fail;
}
mlen -= step;
p += step;
}
if ((err = blake2s_update(&S, p, mlen)) < 0) {
goto fail;
}
if ((err = blake2s_final(&S, hash, BLAKE2S_OUTBYTES)) < 0) {
goto fail;
}
if (0 != memcmp(hash, blake2s_keyed_kat[i], BLAKE2S_OUTBYTES)) {
goto fail;
}
}
}
puts("ok");
return 0;
fail:
puts("error");
return -1;
}
#endif

100
crypto/blake2s.h Normal file
View file

@ -0,0 +1,100 @@
/*
BLAKE2 reference source code package - reference C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#ifndef BLAKE2_H
#define BLAKE2_H
#include <stddef.h>
#include <stdint.h>
#include "tunsafe_types.h"
#if defined(_MSC_VER)
#define BLAKE2_PACKED(x) __pragma(pack(push, 1)) x __pragma(pack(pop))
#else
#define BLAKE2_PACKED(x) x __attribute__((packed))
#endif
#if defined(__cplusplus)
//extern "C" {
#endif
enum blake2s_constant {
BLAKE2S_BLOCKBYTES = 64,
BLAKE2S_OUTBYTES = 32,
BLAKE2S_KEYBYTES = 32,
BLAKE2S_SALTBYTES = 8,
BLAKE2S_PERSONALBYTES = 8
};
BLAKE2_PACKED(struct blake2s_param__ {
uint8_t digest_length; /* 1 */
uint8_t key_length; /* 2 */
uint8_t fanout; /* 3 */
uint8_t depth; /* 4 */
uint32_t leaf_length; /* 8 */
uint32_t node_offset; /* 12 */
uint16_t xof_length; /* 14 */
uint8_t node_depth; /* 15 */
uint8_t inner_length; /* 16 */
/* uint8_t reserved[0]; */
uint32_t salt[BLAKE2S_SALTBYTES / 4]; /* 24 */
uint32_t personal[BLAKE2S_PERSONALBYTES / 4]; /* 32 */
});
typedef struct blake2s_param__ blake2s_param;
/* Padded structs result in a compile-time error */
enum {
BLAKE2_DUMMY_1 = 1 / (sizeof(blake2s_param) == BLAKE2S_OUTBYTES ? 1 : 0),
};
typedef struct blake2s_state__ {
union {
uint32_t h[8];
blake2s_param param;
};
uint32_t t[2];
uint32_t f[2];
uint8_t buf[BLAKE2S_BLOCKBYTES];
size_t buflen;
size_t outlen;
uint8_t last_node;
} blake2s_state;
/* Streaming API */
void blake2s_init(blake2s_state *S, size_t outlen);
void blake2s_init_key(blake2s_state *S, size_t outlen, const void *key, size_t keylen);
void blake2s_init_param(blake2s_state *S, const blake2s_param *P);
void blake2s_update(blake2s_state *S, const void *in, size_t inlen);
void blake2s_final(blake2s_state *S, void *out, size_t outlen);
/* Simple API */
void blake2s(void *out, size_t outlen, const void *in, size_t inlen, const void *key, size_t keylen);
void blake2s_hmac(uint8_t *out, size_t outlen, const uint8_t *in, size_t inlen, const uint8_t *key, size_t keylen);
void blake2s_hkdf(uint8 *dst1, size_t dst1_size,
uint8 *dst2, size_t dst2_size,
uint8 *dst3, size_t dst3_size,
const uint8 *data, size_t data_size,
const uint8 *key, size_t key_size);
#if defined(__cplusplus)
//}
#endif
#endif

399
crypto/blake2s_sse.cpp Normal file
View file

@ -0,0 +1,399 @@
/*
BLAKE2 reference source code package - optimized C implementations
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
*/
#include "stdafx.h"
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include "blake2s.h"
#include "crypto_ops.h"
#include <emmintrin.h>
#if defined(HAVE_SSSE3)
#include <tmmintrin.h>
#endif
#if defined(HAVE_SSE41)
#include <smmintrin.h>
#endif
#if defined(HAVE_AVX)
#include <immintrin.h>
#endif
#if defined(HAVE_XOP)
#include <x86intrin.h>
#endif
#include "blake2s-round.h"
#if !defined(__cplusplus) && (!defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L)
#if defined(_MSC_VER)
#define BLAKE2_INLINE __inline
#elif defined(__GNUC__)
#define BLAKE2_INLINE __inline__
#else
#define BLAKE2_INLINE
#endif
#else
#define BLAKE2_INLINE inline
#endif
static BLAKE2_INLINE uint32_t load32(const void *src) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
uint32_t w;
memcpy(&w, src, sizeof w);
return w;
#else
const uint8_t *p = (const uint8_t *)src;
return ((uint32_t)(p[0]) << 0) |
((uint32_t)(p[1]) << 8) |
((uint32_t)(p[2]) << 16) |
((uint32_t)(p[3]) << 24);
#endif
}
static BLAKE2_INLINE void store32(void *dst, uint32_t w) {
#if defined(ARCH_CPU_LITTLE_ENDIAN)
memcpy(dst, &w, sizeof w);
#else
uint8_t *p = (uint8_t *)dst;
p[0] = (uint8_t)(w >> 0);
p[1] = (uint8_t)(w >> 8);
p[2] = (uint8_t)(w >> 16);
p[3] = (uint8_t)(w >> 24);
#endif
}
static const uint32_t blake2s_IV[8] =
{
0x6A09E667UL, 0xBB67AE85UL, 0x3C6EF372UL, 0xA54FF53AUL,
0x510E527FUL, 0x9B05688CUL, 0x1F83D9ABUL, 0x5BE0CD19UL
};
/* Some helper functions */
static void blake2s_set_lastnode( blake2s_state *S )
{
S->f[1] = (uint32_t)-1;
}
static int blake2s_is_lastblock( const blake2s_state *S )
{
return S->f[0] != 0;
}
static void blake2s_set_lastblock( blake2s_state *S )
{
if( S->last_node ) blake2s_set_lastnode( S );
S->f[0] = (uint32_t)-1;
}
static void blake2s_increment_counter( blake2s_state *S, const uint32_t inc )
{
uint64_t t = ( ( uint64_t )S->t[1] << 32 ) | S->t[0];
t += inc;
S->t[0] = ( uint32_t )( t >> 0 );
S->t[1] = ( uint32_t )( t >> 32 );
}
/* init2 xors IV with input parameter block */
#if 0
void blake2s_init_param( blake2s_state *S, const blake2s_param *P )
{
size_t i;
/*blake2s_init0( S ); */
const uint8_t * v = ( const uint8_t * )( blake2s_IV );
const uint8_t * p = ( const uint8_t * )( P );
uint8_t * h = ( uint8_t * )( S->h );
/* IV XOR ParamBlock */
memset( S, 0, sizeof( blake2s_state ) );
for( i = 0; i < BLAKE2S_OUTBYTES; ++i ) h[i] = v[i] ^ p[i];
S->outlen = P->digest_length;
}
/* Some sort of default parameter block initialization, for sequential blake2s */
void blake2s_init( blake2s_state *S, size_t outlen )
{
blake2s_param P[1];
assert(outlen && outlen <= BLAKE2S_OUTBYTES);
P->digest_length = (uint8_t)outlen;
P->key_length = 0;
P->fanout = 1;
P->depth = 1;
store32( &P->leaf_length, 0 );
store32( &P->node_offset, 0 );
store16( &P->xof_length, 0 );
P->node_depth = 0;
P->inner_length = 0;
/* memset(P->reserved, 0, sizeof(P->reserved) ); */
memset( P->salt, 0, sizeof( P->salt ) );
memset( P->personal, 0, sizeof( P->personal ) );
blake2s_init_param( S, P );
}
int blake2s_init_key( blake2s_state *S, size_t outlen, const void *key, size_t keylen )
{
blake2s_param P[1];
/* Move interval verification here? */
if ( ( !outlen ) || ( outlen > BLAKE2S_OUTBYTES ) ) return -1;
if ( ( !key ) || ( !keylen ) || keylen > BLAKE2S_KEYBYTES ) return -1;
P->digest_length = (uint8_t)outlen;
P->key_length = (uint8_t)keylen;
P->fanout = 1;
P->depth = 1;
store32( &P->leaf_length, 0 );
store32( &P->node_offset, 0 );
store16( &P->xof_length, 0 );
P->node_depth = 0;
P->inner_length = 0;
/* memset(P->reserved, 0, sizeof(P->reserved) ); */
memset( P->salt, 0, sizeof( P->salt ) );
memset( P->personal, 0, sizeof( P->personal ) );
if( blake2s_init_param( S, P ) < 0 )
return -1;
{
uint8_t block[BLAKE2S_BLOCKBYTES];
memset( block, 0, BLAKE2S_BLOCKBYTES );
memcpy( block, key, keylen );
blake2s_update( S, block, BLAKE2S_BLOCKBYTES );
memzero_crypto( block, BLAKE2S_BLOCKBYTES ); /* Burn the key from stack */
}
return 0;
}
#endif
void blake2s_compress_sse( blake2s_state *S, const uint8_t block[BLAKE2S_BLOCKBYTES] )
{
__m128i row1, row2, row3, row4;
__m128i buf1, buf2, buf3, buf4;
#if defined(HAVE_SSE41)
__m128i t0, t1;
#if !defined(HAVE_XOP)
__m128i t2;
#endif
#endif
__m128i ff0, ff1;
#if defined(HAVE_SSSE3) && !defined(HAVE_XOP)
const __m128i r8 = _mm_set_epi8( 12, 15, 14, 13, 8, 11, 10, 9, 4, 7, 6, 5, 0, 3, 2, 1 );
const __m128i r16 = _mm_set_epi8( 13, 12, 15, 14, 9, 8, 11, 10, 5, 4, 7, 6, 1, 0, 3, 2 );
#endif
#if defined(HAVE_SSE41)
const __m128i m0 = LOADU( block + 00 );
const __m128i m1 = LOADU( block + 16 );
const __m128i m2 = LOADU( block + 32 );
const __m128i m3 = LOADU( block + 48 );
#else
const uint32_t m0 = load32(block + 0 * sizeof(uint32_t));
const uint32_t m1 = load32(block + 1 * sizeof(uint32_t));
const uint32_t m2 = load32(block + 2 * sizeof(uint32_t));
const uint32_t m3 = load32(block + 3 * sizeof(uint32_t));
const uint32_t m4 = load32(block + 4 * sizeof(uint32_t));
const uint32_t m5 = load32(block + 5 * sizeof(uint32_t));
const uint32_t m6 = load32(block + 6 * sizeof(uint32_t));
const uint32_t m7 = load32(block + 7 * sizeof(uint32_t));
const uint32_t m8 = load32(block + 8 * sizeof(uint32_t));
const uint32_t m9 = load32(block + 9 * sizeof(uint32_t));
const uint32_t m10 = load32(block + 10 * sizeof(uint32_t));
const uint32_t m11 = load32(block + 11 * sizeof(uint32_t));
const uint32_t m12 = load32(block + 12 * sizeof(uint32_t));
const uint32_t m13 = load32(block + 13 * sizeof(uint32_t));
const uint32_t m14 = load32(block + 14 * sizeof(uint32_t));
const uint32_t m15 = load32(block + 15 * sizeof(uint32_t));
#endif
row1 = ff0 = LOADU( &S->h[0] );
row2 = ff1 = LOADU( &S->h[4] );
row3 = _mm_loadu_si128( (__m128i const *)&blake2s_IV[0] );
row4 = _mm_xor_si128( _mm_loadu_si128( (__m128i const *)&blake2s_IV[4] ), LOADU( &S->t[0] ) );
ROUND( 0 );
ROUND( 1 );
ROUND( 2 );
ROUND( 3 );
ROUND( 4 );
ROUND( 5 );
ROUND( 6 );
ROUND( 7 );
ROUND( 8 );
ROUND( 9 );
STOREU( &S->h[0], _mm_xor_si128( ff0, _mm_xor_si128( row1, row3 ) ) );
STOREU( &S->h[4], _mm_xor_si128( ff1, _mm_xor_si128( row2, row4 ) ) );
}
#if 0
int blake2s_update( blake2s_state *S, const void *pin, size_t inlen )
{
const unsigned char * in = (const unsigned char *)pin;
if( inlen > 0 )
{
size_t left = S->buflen;
size_t fill = BLAKE2S_BLOCKBYTES - left;
if( inlen > fill )
{
S->buflen = 0;
memcpy( S->buf + left, in, fill ); /* Fill buffer */
blake2s_increment_counter( S, BLAKE2S_BLOCKBYTES );
blake2s_compress( S, S->buf ); /* Compress */
in += fill; inlen -= fill;
while(inlen > BLAKE2S_BLOCKBYTES) {
blake2s_increment_counter(S, BLAKE2S_BLOCKBYTES);
blake2s_compress( S, in );
in += BLAKE2S_BLOCKBYTES;
inlen -= BLAKE2S_BLOCKBYTES;
}
}
memcpy( S->buf + S->buflen, in, inlen );
S->buflen += inlen;
}
return 0;
}
int blake2s_final( blake2s_state *S, void *out, size_t outlen )
{
uint8_t buffer[BLAKE2S_OUTBYTES] = {0};
size_t i;
if( out == NULL || outlen < S->outlen )
return -1;
if( blake2s_is_lastblock( S ) )
return -1;
blake2s_increment_counter( S, (uint32_t)S->buflen );
blake2s_set_lastblock( S );
memset( S->buf + S->buflen, 0, BLAKE2S_BLOCKBYTES - S->buflen ); /* Padding */
blake2s_compress( S, S->buf );
for( i = 0; i < 8; ++i ) /* Output full hash to temp buffer */
store32( buffer + sizeof( S->h[i] ) * i, S->h[i] );
memcpy( out, buffer, S->outlen );
memzero_crypto( buffer, sizeof(buffer) );
return 0;
}
/* inlen, at least, should be uint64_t. Others can be size_t. */
int blake2s( void *out, size_t outlen, const void *in, size_t inlen, const void *key, size_t keylen )
{
blake2s_state S[1];
/* Verify parameters */
if ( NULL == in && inlen > 0 ) return -1;
if ( NULL == out ) return -1;
if ( NULL == key && keylen > 0) return -1;
if( !outlen || outlen > BLAKE2S_OUTBYTES ) return -1;
if( keylen > BLAKE2S_KEYBYTES ) return -1;
if( keylen > 0 )
{
if( blake2s_init_key( S, outlen, key, keylen ) < 0 ) return -1;
}
else
{
if( blake2s_init( S, outlen ) < 0 ) return -1;
}
blake2s_update( S, ( const uint8_t * )in, inlen );
blake2s_final( S, out, outlen );
return 0;
}
#endif
#if defined(SUPERCOP)
int crypto_hash( unsigned char *out, unsigned char *in, unsigned long long inlen )
{
return blake2s( out, BLAKE2S_OUTBYTES, in, inlen, NULL, 0 );
}
#endif
#if defined(BLAKE2S_SELFTEST)
#include <string.h>
#include "blake2-kat.h"
int main( void )
{
uint8_t key[BLAKE2S_KEYBYTES];
uint8_t buf[BLAKE2_KAT_LENGTH];
size_t i, step;
for( i = 0; i < BLAKE2S_KEYBYTES; ++i )
key[i] = ( uint8_t )i;
for( i = 0; i < BLAKE2_KAT_LENGTH; ++i )
buf[i] = ( uint8_t )i;
/* Test simple API */
for( i = 0; i < BLAKE2_KAT_LENGTH; ++i )
{
uint8_t hash[BLAKE2S_OUTBYTES];
blake2s( hash, BLAKE2S_OUTBYTES, buf, i, key, BLAKE2S_KEYBYTES );
if( 0 != memcmp( hash, blake2s_keyed_kat[i], BLAKE2S_OUTBYTES ) )
{
goto fail;
}
}
/* Test streaming API */
for(step = 1; step < BLAKE2S_BLOCKBYTES; ++step) {
for (i = 0; i < BLAKE2_KAT_LENGTH; ++i) {
uint8_t hash[BLAKE2S_OUTBYTES];
blake2s_state S;
uint8_t * p = buf;
size_t mlen = i;
int err = 0;
if( (err = blake2s_init_key(&S, BLAKE2S_OUTBYTES, key, BLAKE2S_KEYBYTES)) < 0 ) {
goto fail;
}
while (mlen >= step) {
if ( (err = blake2s_update(&S, p, step)) < 0 ) {
goto fail;
}
mlen -= step;
p += step;
}
if ( (err = blake2s_update(&S, p, mlen)) < 0) {
goto fail;
}
if ( (err = blake2s_final(&S, hash, BLAKE2S_OUTBYTES)) < 0) {
goto fail;
}
if (0 != memcmp(hash, blake2s_keyed_kat[i], BLAKE2S_OUTBYTES)) {
goto fail;
}
}
}
puts( "ok" );
return 0;
fail:
puts("error");
return -1;
}
#endif

3011
crypto/chacha20_x64.asm Normal file

File diff suppressed because it is too large Load diff

2623
crypto/chacha20_x64_gas.s Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

596
crypto/chacha20poly1305.cpp Normal file
View file

@ -0,0 +1,596 @@
/* SPDX-License-Identifier: OpenSSL OR (BSD-3-Clause OR GPL-2.0)
*
* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
* Copyright 2016 The OpenSSL Project Authors. All Rights Reserved.
*/
#include "stdafx.h"
#include "crypto/chacha20poly1305.h"
#include "tunsafe_types.h"
#include "tunsafe_endian.h"
#include "build_config.h"
#include "tunsafe_cpu.h"
#include "crypto_ops.h"
#include <string.h>
#include <assert.h>
enum {
CHACHA20_IV_SIZE = 16,
CHACHA20_KEY_SIZE = 32,
CHACHA20_BLOCK_SIZE = 64,
POLY1305_BLOCK_SIZE = 16,
POLY1305_KEY_SIZE = 32,
POLY1305_MAC_SIZE = 16
};
#if defined(OS_MACOSX) || !WITH_AVX512_OPTIMIZATIONS
#define CHACHA20_WITH_AVX512 0
#else
#define CHACHA20_WITH_AVX512 1
#endif
extern "C" {
void _cdecl hchacha20_ssse3(uint8 *derived_key, const uint8 *nonce, const uint8 *key);
void _cdecl chacha20_ssse3(uint8 *out, const uint8 *in, size_t len, const uint32 key[8], const uint32 counter[4]);
void _cdecl chacha20_avx2(uint8 *out, const uint8 *in, size_t len, const uint32 key[8], const uint32 counter[4]);
void _cdecl chacha20_avx512(uint8 *out, const uint8 *in, size_t len, const uint32 key[8], const uint32 counter[4]);
void _cdecl chacha20_avx512vl(uint8 *out, const uint8 *in, size_t len, const uint32 key[8], const uint32 counter[4]);
void _cdecl poly1305_init_x86_64(void *ctx, const uint8 key[16]);
void _cdecl poly1305_blocks_x86_64(void *ctx, const uint8 *inp, size_t len, uint32 padbit);
void _cdecl poly1305_emit_x86_64(void *ctx, uint8 mac[16], const uint32 nonce[4]);
void _cdecl poly1305_emit_avx(void *ctx, uint8 mac[16], const uint32 nonce[4]);
void _cdecl poly1305_blocks_avx(void *ctx, const uint8 *inp, size_t len, uint32 padbit);
void _cdecl poly1305_blocks_avx2(void *ctx, const uint8 *inp, size_t len, uint32 padbit);
void _cdecl poly1305_blocks_avx512(void *ctx, const uint8 *inp, size_t len, uint32 padbit);
}
struct chacha20_ctx {
uint32 state[CHACHA20_BLOCK_SIZE / sizeof(uint32)];
};
void crypto_xor(uint8 *dst, const uint8 *src, size_t n) {
for (; n >= 4; n -= 4, dst += 4, src += 4)
*(uint32*)dst ^= *(uint32*)src;
for (; n; n--)
*dst++ ^= *src++;
}
int memcmp_crypto(const uint8 *a, const uint8 *b, size_t n) {
int rv = 0;
for (; n >= 4; n -= 4, a += 4, b += 4)
rv |= *(uint32*)a ^ *(uint32*)b;
for (; n; n--)
rv |= *a++ ^ *b++;
return rv;
}
#define QUARTER_ROUND(x, a, b, c, d) ( \
x[a] += x[b], \
x[d] = rol32((x[d] ^ x[a]), 16), \
x[c] += x[d], \
x[b] = rol32((x[b] ^ x[c]), 12), \
x[a] += x[b], \
x[d] = rol32((x[d] ^ x[a]), 8), \
x[c] += x[d], \
x[b] = rol32((x[b] ^ x[c]), 7) \
)
#define C(i, j) (i * 4 + j)
#define DOUBLE_ROUND(x) ( \
/* Column Round */ \
QUARTER_ROUND(x, C(0, 0), C(1, 0), C(2, 0), C(3, 0)), \
QUARTER_ROUND(x, C(0, 1), C(1, 1), C(2, 1), C(3, 1)), \
QUARTER_ROUND(x, C(0, 2), C(1, 2), C(2, 2), C(3, 2)), \
QUARTER_ROUND(x, C(0, 3), C(1, 3), C(2, 3), C(3, 3)), \
/* Diagonal Round */ \
QUARTER_ROUND(x, C(0, 0), C(1, 1), C(2, 2), C(3, 3)), \
QUARTER_ROUND(x, C(0, 1), C(1, 2), C(2, 3), C(3, 0)), \
QUARTER_ROUND(x, C(0, 2), C(1, 3), C(2, 0), C(3, 1)), \
QUARTER_ROUND(x, C(0, 3), C(1, 0), C(2, 1), C(3, 2)) \
)
#define TWENTY_ROUNDS(x) ( \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x), \
DOUBLE_ROUND(x) \
)
SAFEBUFFERS static void chacha20_block_generic(struct chacha20_ctx *ctx, uint32 *stream)
{
uint32 x[CHACHA20_BLOCK_SIZE / sizeof(uint32)];
int i;
for (i = 0; i < ARRAY_SIZE(x); ++i)
x[i] = ctx->state[i];
TWENTY_ROUNDS(x);
for (i = 0; i < ARRAY_SIZE(x); ++i)
stream[i] = ToLE32(x[i] + ctx->state[i]);
++ctx->state[12];
}
SAFEBUFFERS static void hchacha20_generic(uint8 derived_key[CHACHA20POLY1305_KEYLEN], const uint8 nonce[16], const uint8 key[CHACHA20POLY1305_KEYLEN])
{
uint32 *out = (uint32 *)derived_key;
uint32 x[] = {
0x61707865, 0x3320646e, 0x79622d32, 0x6b206574,
ReadLE32(key + 0), ReadLE32(key + 4), ReadLE32(key + 8), ReadLE32(key + 12),
ReadLE32(key + 16), ReadLE32(key + 20), ReadLE32(key + 24), ReadLE32(key + 28),
ReadLE32(nonce + 0), ReadLE32(nonce + 4), ReadLE32(nonce + 8), ReadLE32(nonce + 12)
};
TWENTY_ROUNDS(x);
out[0] = ToLE32(x[0]);
out[1] = ToLE32(x[1]);
out[2] = ToLE32(x[2]);
out[3] = ToLE32(x[3]);
out[4] = ToLE32(x[12]);
out[5] = ToLE32(x[13]);
out[6] = ToLE32(x[14]);
out[7] = ToLE32(x[15]);
}
static inline void hchacha20(uint8 derived_key[CHACHA20POLY1305_KEYLEN], const uint8 nonce[16], const uint8 key[CHACHA20POLY1305_KEYLEN])
{
#if defined(ARCH_CPU_X86_64) && defined(COMPILER_MSVC)
if (X86_PCAP_SSSE3) {
hchacha20_ssse3(derived_key, nonce, key);
return;
}
#endif // defined(ARCH_CPU_X86_64)
hchacha20_generic(derived_key, nonce, key);
}
#define chacha20_initial_state(key, nonce) {{ \
0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, \
ReadLE32((key) + 0), ReadLE32((key) + 4), ReadLE32((key) + 8), ReadLE32((key) + 12), \
ReadLE32((key) + 16), ReadLE32((key) + 20), ReadLE32((key) + 24), ReadLE32((key) + 28), \
0, 0, ReadLE32((nonce) + 0), ReadLE32((nonce) + 4) \
}}
SAFEBUFFERS static void chacha20_crypt(struct chacha20_ctx *ctx, uint8 *dst, const uint8 *src, uint32 bytes)
{
uint32 buf[CHACHA20_BLOCK_SIZE / sizeof(uint32)];
if (bytes == 0)
return;
#if defined(ARCH_CPU_X86_64)
#if CHACHA20_WITH_AVX512
if (X86_PCAP_AVX512F) {
chacha20_avx512(dst, src, bytes, &ctx->state[4], &ctx->state[12]);
ctx->state[12] += (bytes + 63) / 64;
return;
}
if (X86_PCAP_AVX512VL) {
chacha20_avx512vl(dst, src, bytes, &ctx->state[4], &ctx->state[12]);
ctx->state[12] += (bytes + 63) / 64;
return;
}
#endif // CHACHA20_WITH_AVX512
if (X86_PCAP_AVX2) {
chacha20_avx2(dst, src, bytes, &ctx->state[4], &ctx->state[12]);
ctx->state[12] += (bytes + 63) / 64;
return;
}
if (X86_PCAP_SSSE3) {
assert(bytes);
chacha20_ssse3(dst, src, bytes, &ctx->state[4], &ctx->state[12]);
ctx->state[12] += (bytes + 63) / 64;
return;
}
#endif // defined(ARCH_CPU_X86_64)
if (dst != src)
memcpy(dst, src, bytes);
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_generic(ctx, buf);
crypto_xor(dst, (uint8 *)buf, CHACHA20_BLOCK_SIZE);
bytes -= CHACHA20_BLOCK_SIZE;
dst += CHACHA20_BLOCK_SIZE;
}
if (bytes) {
chacha20_block_generic(ctx, buf);
crypto_xor(dst, (uint8 *)buf, bytes);
}
}
struct poly1305_ctx {
uint8 opaque[24 * sizeof(uint64)];
uint32 nonce[4];
uint8 data[POLY1305_BLOCK_SIZE];
size_t num;
};
#if !(defined(CONFIG_X86_64) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || (defined(CONFIG_MIPS) && defined(CONFIG_64BIT)))
struct poly1305_internal {
uint32 h[5];
uint32 r[4];
};
static void poly1305_init_generic(void *ctx, const uint8 key[16]) {
struct poly1305_internal *st = (struct poly1305_internal *)ctx;
/* h = 0 */
st->h[0] = 0;
st->h[1] = 0;
st->h[2] = 0;
st->h[3] = 0;
st->h[4] = 0;
/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
st->r[0] = ReadLE32(&key[ 0]) & 0x0fffffff;
st->r[1] = ReadLE32(&key[ 4]) & 0x0ffffffc;
st->r[2] = ReadLE32(&key[ 8]) & 0x0ffffffc;
st->r[3] = ReadLE32(&key[12]) & 0x0ffffffc;
}
static void poly1305_blocks_generic(void *ctx, const uint8 *inp, size_t len, uint32 padbit)
{
#define CONSTANT_TIME_CARRY(a,b) ((a ^ ((a ^ b) | ((a - b) ^ b))) >> (sizeof(a) * 8 - 1))
struct poly1305_internal *st = (struct poly1305_internal *)ctx;
uint32 r0, r1, r2, r3;
uint32 s1, s2, s3;
uint32 h0, h1, h2, h3, h4, c;
uint64 d0, d1, d2, d3;
r0 = st->r[0];
r1 = st->r[1];
r2 = st->r[2];
r3 = st->r[3];
s1 = r1 + (r1 >> 2);
s2 = r2 + (r2 >> 2);
s3 = r3 + (r3 >> 2);
h0 = st->h[0];
h1 = st->h[1];
h2 = st->h[2];
h3 = st->h[3];
h4 = st->h[4];
while (len >= POLY1305_BLOCK_SIZE) {
/* h += m[i] */
h0 = (uint32)(d0 = (uint64)h0 + ReadLE32(inp + 0));
h1 = (uint32)(d1 = (uint64)h1 + (d0 >> 32) + ReadLE32(inp + 4));
h2 = (uint32)(d2 = (uint64)h2 + (d1 >> 32) + ReadLE32(inp + 8));
h3 = (uint32)(d3 = (uint64)h3 + (d2 >> 32) + ReadLE32(inp + 12));
h4 += (uint32)(d3 >> 32) + padbit;
/* h *= r "%" p, where "%" stands for "partial remainder" */
d0 = ((uint64)h0 * r0) +
((uint64)h1 * s3) +
((uint64)h2 * s2) +
((uint64)h3 * s1);
d1 = ((uint64)h0 * r1) +
((uint64)h1 * r0) +
((uint64)h2 * s3) +
((uint64)h3 * s2) +
(h4 * s1);
d2 = ((uint64)h0 * r2) +
((uint64)h1 * r1) +
((uint64)h2 * r0) +
((uint64)h3 * s3) +
(h4 * s2);
d3 = ((uint64)h0 * r3) +
((uint64)h1 * r2) +
((uint64)h2 * r1) +
((uint64)h3 * r0) +
(h4 * s3);
h4 = (h4 * r0);
/* last reduction step: */
/* a) h4:h0 = h4<<128 + d3<<96 + d2<<64 + d1<<32 + d0 */
h0 = (uint32)d0;
h1 = (uint32)(d1 += d0 >> 32);
h2 = (uint32)(d2 += d1 >> 32);
h3 = (uint32)(d3 += d2 >> 32);
h4 += (uint32)(d3 >> 32);
/* b) (h4:h0 += (h4:h0>>130) * 5) %= 2^130 */
c = (h4 >> 2) + (h4 & ~3U);
h4 &= 3;
h0 += c;
h1 += (c = CONSTANT_TIME_CARRY(h0,c));
h2 += (c = CONSTANT_TIME_CARRY(h1,c));
h3 += (c = CONSTANT_TIME_CARRY(h2,c));
h4 += CONSTANT_TIME_CARRY(h3,c);
/*
* Occasional overflows to 3rd bit of h4 are taken care of
* "naturally". If after this point we end up at the top of
* this loop, then the overflow bit will be accounted for
* in next iteration. If we end up in poly1305_emit, then
* comparison to modulus below will still count as "carry
* into 131st bit", so that properly reduced value will be
* picked in conditional move.
*/
inp += POLY1305_BLOCK_SIZE;
len -= POLY1305_BLOCK_SIZE;
}
st->h[0] = h0;
st->h[1] = h1;
st->h[2] = h2;
st->h[3] = h3;
st->h[4] = h4;
#undef CONSTANT_TIME_CARRY
}
static void poly1305_emit_generic(void *ctx, uint8 mac[16], const uint32 nonce[4])
{
struct poly1305_internal *st = (struct poly1305_internal *)ctx;
uint32 *omac = (uint32 *)mac;
uint32 h0, h1, h2, h3, h4;
uint32 g0, g1, g2, g3, g4;
uint64 t;
uint32 mask;
h0 = st->h[0];
h1 = st->h[1];
h2 = st->h[2];
h3 = st->h[3];
h4 = st->h[4];
/* compare to modulus by computing h + -p */
g0 = (uint32)(t = (uint64)h0 + 5);
g1 = (uint32)(t = (uint64)h1 + (t >> 32));
g2 = (uint32)(t = (uint64)h2 + (t >> 32));
g3 = (uint32)(t = (uint64)h3 + (t >> 32));
g4 = h4 + (uint32)(t >> 32);
/* if there was carry into 131st bit, h3:h0 = g3:g0 */
mask = 0 - (g4 >> 2);
g0 &= mask;
g1 &= mask;
g2 &= mask;
g3 &= mask;
mask = ~mask;
h0 = (h0 & mask) | g0;
h1 = (h1 & mask) | g1;
h2 = (h2 & mask) | g2;
h3 = (h3 & mask) | g3;
/* mac = (h + nonce) % (2^128) */
h0 = (uint32)(t = (uint64)h0 + nonce[0]);
h1 = (uint32)(t = (uint64)h1 + (t >> 32) + nonce[1]);
h2 = (uint32)(t = (uint64)h2 + (t >> 32) + nonce[2]);
h3 = (uint32)(t = (uint64)h3 + (t >> 32) + nonce[3]);
omac[0] = ToLE32(h0);
omac[1] = ToLE32(h1);
omac[2] = ToLE32(h2);
omac[3] = ToLE32(h3);
}
#endif
SAFEBUFFERS static void poly1305_init(struct poly1305_ctx *ctx, const uint8 key[POLY1305_KEY_SIZE])
{
ctx->nonce[0] = ReadLE32(&key[16]);
ctx->nonce[1] = ReadLE32(&key[20]);
ctx->nonce[2] = ReadLE32(&key[24]);
ctx->nonce[3] = ReadLE32(&key[28]);
#if defined(ARCH_CPU_X86_64)
poly1305_init_x86_64(ctx->opaque, key);
#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
poly1305_init_arm(ctx->opaque, key);
#elif defined(CONFIG_MIPS) && defined(CONFIG_64BIT)
poly1305_init_mips(ctx->opaque, key);
#else
poly1305_init_generic(ctx->opaque, key);
#endif
ctx->num = 0;
}
static inline void poly1305_blocks(void *ctx, const uint8 *inp, size_t len, uint32 padbit)
{
#if defined(ARCH_CPU_X86_64)
#if CHACHA20_WITH_AVX512
if(X86_PCAP_AVX512F)
poly1305_blocks_avx512(ctx, inp, len, padbit);
else
#endif // CHACHA20_WITH_AVX512
if (X86_PCAP_AVX2)
poly1305_blocks_avx2(ctx, inp, len, padbit);
else if (X86_PCAP_AVX)
poly1305_blocks_avx(ctx, inp, len, padbit);
else
poly1305_blocks_x86_64(ctx, inp, len, padbit);
#else // defined(ARCH_CPU_X86_64)
poly1305_blocks_generic(ctx, inp, len, padbit);
#endif // defined(ARCH_CPU_X86_64)
}
static inline void poly1305_emit(void *ctx, uint8 mac[16], const uint32 nonce[4])
{
#if defined(ARCH_CPU_X86_64)
if (X86_PCAP_AVX)
poly1305_emit_avx(ctx, mac, nonce);
else
poly1305_emit_x86_64(ctx, mac, nonce);
#else // defined(ARCH_CPU_X86_64)
poly1305_emit_generic(ctx, mac, nonce);
#endif // defined(ARCH_CPU_X86_64)
}
SAFEBUFFERS static void poly1305_update(struct poly1305_ctx *ctx, const uint8 *inp, size_t len)
{
const size_t num = ctx->num;
size_t rem;
if (num) {
rem = POLY1305_BLOCK_SIZE - num;
if (len >= rem) {
memcpy(ctx->data + num, inp, rem);
poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 1);
inp += rem;
len -= rem;
} else {
/* Still not enough data to process a block. */
memcpy(ctx->data + num, inp, len);
ctx->num = num + len;
return;
}
}
rem = len % POLY1305_BLOCK_SIZE;
len -= rem;
if (len >= POLY1305_BLOCK_SIZE) {
poly1305_blocks(ctx->opaque, inp, len, 1);
inp += len;
}
if (rem)
memcpy(ctx->data, inp, rem);
ctx->num = rem;
}
SAFEBUFFERS static void poly1305_finish(struct poly1305_ctx *ctx, uint8 mac[16])
{
size_t num = ctx->num;
if (num) {
ctx->data[num++] = 1; /* pad bit */
while (num < POLY1305_BLOCK_SIZE)
ctx->data[num++] = 0;
poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 0);
}
poly1305_emit(ctx->opaque, mac, ctx->nonce);
/* zero out the state */
memzero_crypto(ctx, sizeof(*ctx));
}
static const uint8 pad0[16] = { 0 };
SAFEBUFFERS static FORCEINLINE void poly1305_getmac(const uint8 *ad, size_t ad_len, const uint8 *src, size_t src_len, const uint8 key[POLY1305_KEY_SIZE], uint8 mac[CHACHA20POLY1305_AUTHTAGLEN]) {
uint64 len[2];
struct poly1305_ctx poly1305_state;
poly1305_init(&poly1305_state, key);
poly1305_update(&poly1305_state, ad, ad_len);
poly1305_update(&poly1305_state, pad0, (0 - ad_len) & 0xf);
poly1305_update(&poly1305_state, src, src_len);
poly1305_update(&poly1305_state, pad0, (0 - src_len) & 0xf);
len[0] = ToLE64(ad_len);
len[1] = ToLE64(src_len);
poly1305_update(&poly1305_state, (uint8 *)&len, sizeof(len));
poly1305_finish(&poly1305_state, mac);
}
struct ChaChaState {
struct chacha20_ctx chacha20_state;
uint8 block0[CHACHA20_BLOCK_SIZE];
};
static inline void InitializeChaChaState(ChaChaState *st, const uint8 key[CHACHA20POLY1305_KEYLEN], uint64 nonce) {
uint64 le_nonce = ToLE64(nonce);
WriteLE64((uint8*)st, 0x3320646e61707865);
WriteLE64((uint8*)st + 8, 0x6b20657479622d32);
Write64((uint8*)st + 16, Read64(key + 0));
Write64((uint8*)st + 24, Read64(key + 8));
Write64((uint8*)st + 32, Read64(key + 16));
Write64((uint8*)st + 40, Read64(key + 24));
Write64((uint8*)st + 48, 0);
Write64((uint8*)st + 56, Read64((uint8*)&le_nonce));
Write64((uint8*)st + 64 + 0 * 8, 0);
Write64((uint8*)st + 64 + 1 * 8, 0);
Write64((uint8*)st + 64 + 2 * 8, 0);
Write64((uint8*)st + 64 + 3 * 8, 0);
Write64((uint8*)st + 64 + 4 * 8, 0);
Write64((uint8*)st + 64 + 5 * 8, 0);
Write64((uint8*)st + 64 + 6 * 8, 0);
Write64((uint8*)st + 64 + 7 * 8, 0);
}
SAFEBUFFERS void poly1305_get_mac(const uint8 *src, size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN],
uint8 mac[CHACHA20POLY1305_AUTHTAGLEN]) {
ChaChaState st;
InitializeChaChaState(&st, key, nonce);
chacha20_crypt(&st.chacha20_state, st.block0, st.block0, sizeof(st.block0));
poly1305_getmac(ad, ad_len, src, src_len, st.block0, mac);
memzero_crypto(&st, sizeof(st));
}
SAFEBUFFERS void chacha20poly1305_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN]) {
ChaChaState st;
InitializeChaChaState(&st, key, nonce);
chacha20_crypt(&st.chacha20_state, st.block0, st.block0, sizeof(st.block0));
chacha20_crypt(&st.chacha20_state, dst, src, (uint32)src_len);
poly1305_getmac(ad, ad_len, dst, src_len, st.block0, dst + src_len);
memzero_crypto(&st, sizeof(st));
}
SAFEBUFFERS void chacha20poly1305_decrypt_get_mac(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN],
uint8 mac[CHACHA20POLY1305_AUTHTAGLEN]) {
ChaChaState st;
InitializeChaChaState(&st, key, nonce);
chacha20_crypt(&st.chacha20_state, st.block0, st.block0, sizeof(st.block0));
poly1305_getmac(ad, ad_len, src, src_len, st.block0, mac);
chacha20_crypt(&st.chacha20_state, dst, src, (uint32)src_len);
memzero_crypto(&st, sizeof(st));
}
SAFEBUFFERS bool chacha20poly1305_decrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN]) {
uint8 mac[POLY1305_MAC_SIZE];
if (src_len < CHACHA20POLY1305_AUTHTAGLEN)
return false;
chacha20poly1305_decrypt_get_mac(dst, src, src_len - CHACHA20POLY1305_AUTHTAGLEN, ad, ad_len, nonce, key, mac);
return memcmp_crypto(mac, src + src_len - CHACHA20POLY1305_AUTHTAGLEN, CHACHA20POLY1305_AUTHTAGLEN) == 0;
}
void xchacha20poly1305_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint8 nonce[XCHACHA20POLY1305_NONCELEN],
const uint8 key[CHACHA20POLY1305_KEYLEN])
{
__aligned(16) uint8 derived_key[CHACHA20POLY1305_KEYLEN];
hchacha20(derived_key, nonce, key);
chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len, ReadLE64(nonce + 16), derived_key);
memzero_crypto(derived_key, CHACHA20POLY1305_KEYLEN);
}
bool xchacha20poly1305_decrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint8 nonce[XCHACHA20POLY1305_NONCELEN],
const uint8 key[CHACHA20POLY1305_KEYLEN]) {
bool ret;
__aligned(16) uint8 derived_key[CHACHA20POLY1305_KEYLEN];
hchacha20(derived_key, nonce, key);
ret = chacha20poly1305_decrypt(dst, src, src_len, ad, ad_len, ReadLE64(nonce + 16), derived_key);
memzero_crypto(derived_key, CHACHA20POLY1305_KEYLEN);
return ret;
}

39
crypto/chacha20poly1305.h Normal file
View file

@ -0,0 +1,39 @@
#pragma once
#include "tunsafe_types.h"
enum {
XCHACHA20POLY1305_NONCELEN = 24,
CHACHA20POLY1305_KEYLEN = 32,
CHACHA20POLY1305_AUTHTAGLEN = 16
};
void chacha20poly1305_decrypt_get_mac(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN],
uint8 mac[CHACHA20POLY1305_AUTHTAGLEN]);
bool chacha20poly1305_decrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN]);
void chacha20poly1305_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN]);
void xchacha20poly1305_encrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint8 nonce[XCHACHA20POLY1305_NONCELEN],
const uint8 key[CHACHA20POLY1305_KEYLEN]);
bool xchacha20poly1305_decrypt(uint8 *dst, const uint8 *src, const size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint8 nonce[XCHACHA20POLY1305_NONCELEN],
const uint8 key[CHACHA20POLY1305_KEYLEN]);
void poly1305_get_mac(const uint8 *src, size_t src_len,
const uint8 *ad, const size_t ad_len,
const uint64 nonce, const uint8 key[CHACHA20POLY1305_KEYLEN],
uint8 mac[CHACHA20POLY1305_AUTHTAGLEN]);

737
crypto/curve25519-donna.cpp Normal file
View file

@ -0,0 +1,737 @@
/* Copyright 2008, Google Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are
* met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Google Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* curve25519-donna: Curve25519 elliptic curve, public key function
*
* http://code.google.com/p/curve25519-donna/
*
* Adam Langley <agl@imperialviolet.org>
*
* Derived from public domain C code by Daniel J. Bernstein <djb@cr.yp.to>
*
* More information about curve25519 can be found here
* http://cr.yp.to/ecdh.html
*
* djb's sample implementation of curve25519 is written in a special assembly
* language called qhasm and uses the floating point registers.
*
* This is, almost, a clean room reimplementation from the curve25519 paper. It
* uses many of the tricks described therein. Only the crecip function is taken
* from the sample implementation.
*/
#include <string.h>
#include <stdint.h>
#ifdef _MSC_VER
#define inline __inline
#endif
typedef uint8_t u8;
typedef int32_t s32;
typedef int64_t limb;
/* Field element representation:
*
* Field elements are written as an array of signed, 64-bit limbs, least
* significant first. The value of the field element is:
* x[0] + 2^26·x[1] + x^51·x[2] + 2^102·x[3] + ...
*
* i.e. the limbs are 26, 25, 26, 25, ... bits wide.
*/
/* Sum two numbers: output += in */
static void fsum(limb *output, const limb *in) {
unsigned i;
for (i = 0; i < 10; i += 2) {
output[0+i] = (output[0+i] + in[0+i]);
output[1+i] = (output[1+i] + in[1+i]);
}
}
/* Find the difference of two numbers: output = in - output
* (note the order of the arguments!)
*/
static void fdifference(limb *output, const limb *in) {
unsigned i;
for (i = 0; i < 10; ++i) {
output[i] = (in[i] - output[i]);
}
}
/* Multiply a number by a scalar: output = in * scalar */
static void fscalar_product(limb *output, const limb *in, const limb scalar) {
unsigned i;
for (i = 0; i < 10; ++i) {
output[i] = in[i] * scalar;
}
}
/* Multiply two numbers: output = in2 * in
*
* output must be distinct to both inputs. The inputs are reduced coefficient
* form, the output is not.
*/
static void fproduct(limb *output, const limb *in2, const limb *in) {
output[0] = ((limb) ((s32) in2[0])) * ((s32) in[0]);
output[1] = ((limb) ((s32) in2[0])) * ((s32) in[1]) +
((limb) ((s32) in2[1])) * ((s32) in[0]);
output[2] = 2 * ((limb) ((s32) in2[1])) * ((s32) in[1]) +
((limb) ((s32) in2[0])) * ((s32) in[2]) +
((limb) ((s32) in2[2])) * ((s32) in[0]);
output[3] = ((limb) ((s32) in2[1])) * ((s32) in[2]) +
((limb) ((s32) in2[2])) * ((s32) in[1]) +
((limb) ((s32) in2[0])) * ((s32) in[3]) +
((limb) ((s32) in2[3])) * ((s32) in[0]);
output[4] = ((limb) ((s32) in2[2])) * ((s32) in[2]) +
2 * (((limb) ((s32) in2[1])) * ((s32) in[3]) +
((limb) ((s32) in2[3])) * ((s32) in[1])) +
((limb) ((s32) in2[0])) * ((s32) in[4]) +
((limb) ((s32) in2[4])) * ((s32) in[0]);
output[5] = ((limb) ((s32) in2[2])) * ((s32) in[3]) +
((limb) ((s32) in2[3])) * ((s32) in[2]) +
((limb) ((s32) in2[1])) * ((s32) in[4]) +
((limb) ((s32) in2[4])) * ((s32) in[1]) +
((limb) ((s32) in2[0])) * ((s32) in[5]) +
((limb) ((s32) in2[5])) * ((s32) in[0]);
output[6] = 2 * (((limb) ((s32) in2[3])) * ((s32) in[3]) +
((limb) ((s32) in2[1])) * ((s32) in[5]) +
((limb) ((s32) in2[5])) * ((s32) in[1])) +
((limb) ((s32) in2[2])) * ((s32) in[4]) +
((limb) ((s32) in2[4])) * ((s32) in[2]) +
((limb) ((s32) in2[0])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[0]);
output[7] = ((limb) ((s32) in2[3])) * ((s32) in[4]) +
((limb) ((s32) in2[4])) * ((s32) in[3]) +
((limb) ((s32) in2[2])) * ((s32) in[5]) +
((limb) ((s32) in2[5])) * ((s32) in[2]) +
((limb) ((s32) in2[1])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[1]) +
((limb) ((s32) in2[0])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[0]);
output[8] = ((limb) ((s32) in2[4])) * ((s32) in[4]) +
2 * (((limb) ((s32) in2[3])) * ((s32) in[5]) +
((limb) ((s32) in2[5])) * ((s32) in[3]) +
((limb) ((s32) in2[1])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[1])) +
((limb) ((s32) in2[2])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[2]) +
((limb) ((s32) in2[0])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[0]);
output[9] = ((limb) ((s32) in2[4])) * ((s32) in[5]) +
((limb) ((s32) in2[5])) * ((s32) in[4]) +
((limb) ((s32) in2[3])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[3]) +
((limb) ((s32) in2[2])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[2]) +
((limb) ((s32) in2[1])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[1]) +
((limb) ((s32) in2[0])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[0]);
output[10] = 2 * (((limb) ((s32) in2[5])) * ((s32) in[5]) +
((limb) ((s32) in2[3])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[3]) +
((limb) ((s32) in2[1])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[1])) +
((limb) ((s32) in2[4])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[4]) +
((limb) ((s32) in2[2])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[2]);
output[11] = ((limb) ((s32) in2[5])) * ((s32) in[6]) +
((limb) ((s32) in2[6])) * ((s32) in[5]) +
((limb) ((s32) in2[4])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[4]) +
((limb) ((s32) in2[3])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[3]) +
((limb) ((s32) in2[2])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[2]);
output[12] = ((limb) ((s32) in2[6])) * ((s32) in[6]) +
2 * (((limb) ((s32) in2[5])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[5]) +
((limb) ((s32) in2[3])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[3])) +
((limb) ((s32) in2[4])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[4]);
output[13] = ((limb) ((s32) in2[6])) * ((s32) in[7]) +
((limb) ((s32) in2[7])) * ((s32) in[6]) +
((limb) ((s32) in2[5])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[5]) +
((limb) ((s32) in2[4])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[4]);
output[14] = 2 * (((limb) ((s32) in2[7])) * ((s32) in[7]) +
((limb) ((s32) in2[5])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[5])) +
((limb) ((s32) in2[6])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[6]);
output[15] = ((limb) ((s32) in2[7])) * ((s32) in[8]) +
((limb) ((s32) in2[8])) * ((s32) in[7]) +
((limb) ((s32) in2[6])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[6]);
output[16] = ((limb) ((s32) in2[8])) * ((s32) in[8]) +
2 * (((limb) ((s32) in2[7])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[7]));
output[17] = ((limb) ((s32) in2[8])) * ((s32) in[9]) +
((limb) ((s32) in2[9])) * ((s32) in[8]);
output[18] = 2 * ((limb) ((s32) in2[9])) * ((s32) in[9]);
}
/* Reduce a long form to a short form by taking the input mod 2^255 - 19. */
static void freduce_degree(limb *output) {
/* Each of these shifts and adds ends up multiplying the value by 19. */
output[8] += output[18] << 4;
output[8] += output[18] << 1;
output[8] += output[18];
output[7] += output[17] << 4;
output[7] += output[17] << 1;
output[7] += output[17];
output[6] += output[16] << 4;
output[6] += output[16] << 1;
output[6] += output[16];
output[5] += output[15] << 4;
output[5] += output[15] << 1;
output[5] += output[15];
output[4] += output[14] << 4;
output[4] += output[14] << 1;
output[4] += output[14];
output[3] += output[13] << 4;
output[3] += output[13] << 1;
output[3] += output[13];
output[2] += output[12] << 4;
output[2] += output[12] << 1;
output[2] += output[12];
output[1] += output[11] << 4;
output[1] += output[11] << 1;
output[1] += output[11];
output[0] += output[10] << 4;
output[0] += output[10] << 1;
output[0] += output[10];
}
#if (-1 & 3) != 3
#error "This code only works on a two's complement system"
#endif
/* return v / 2^26, using only shifts and adds. */
static inline limb
div_by_2_26(const limb v)
{
/* High word of v; no shift needed*/
const uint32_t highword = (uint32_t) (((uint64_t) v) >> 32);
/* Set to all 1s if v was negative; else set to 0s. */
const int32_t sign = ((int32_t) highword) >> 31;
/* Set to 0x3ffffff if v was negative; else set to 0. */
const int32_t roundoff = ((uint32_t) sign) >> 6;
/* Should return v / (1<<26) */
return (v + roundoff) >> 26;
}
/* return v / (2^25), using only shifts and adds. */
static inline limb
div_by_2_25(const limb v)
{
/* High word of v; no shift needed*/
const uint32_t highword = (uint32_t) (((uint64_t) v) >> 32);
/* Set to all 1s if v was negative; else set to 0s. */
const int32_t sign = ((int32_t) highword) >> 31;
/* Set to 0x1ffffff if v was negative; else set to 0. */
const int32_t roundoff = ((uint32_t) sign) >> 7;
/* Should return v / (1<<25) */
return (v + roundoff) >> 25;
}
static inline s32
div_s32_by_2_25(const s32 v)
{
const s32 roundoff = ((uint32_t)(v >> 31)) >> 7;
return (v + roundoff) >> 25;
}
/* Reduce all coefficients of the short form input so that |x| < 2^26.
*
* On entry: |output[i]| < 2^62
*/
static void freduce_coefficients(limb *output) {
unsigned i;
output[10] = 0;
for (i = 0; i < 10; i += 2) {
limb over = div_by_2_26(output[i]);
output[i] -= over << 26;
output[i+1] += over;
over = div_by_2_25(output[i+1]);
output[i+1] -= over << 25;
output[i+2] += over;
}
/* Now |output[10]| < 2 ^ 38 and all other coefficients are reduced. */
output[0] += output[10] << 4;
output[0] += output[10] << 1;
output[0] += output[10];
output[10] = 0;
/* Now output[1..9] are reduced, and |output[0]| < 2^26 + 19 * 2^38
* So |over| will be no more than 77825 */
{
limb over = div_by_2_26(output[0]);
output[0] -= over << 26;
output[1] += over;
}
/* Now output[0,2..9] are reduced, and |output[1]| < 2^25 + 77825
* So |over| will be no more than 1. */
{
/* output[1] fits in 32 bits, so we can use div_s32_by_2_25 here. */
s32 over32 = div_s32_by_2_25((s32) output[1]);
output[1] -= over32 << 25;
output[2] += over32;
}
/* Finally, output[0,1,3..9] are reduced, and output[2] is "nearly reduced":
* we have |output[2]| <= 2^26. This is good enough for all of our math,
* but it will require an extra freduce_coefficients before fcontract. */
}
/* A helpful wrapper around fproduct: output = in * in2.
*
* output must be distinct to both inputs. The output is reduced degree and
* reduced coefficient.
*/
static void
fmul(limb *output, const limb *in, const limb *in2) {
limb t[19];
fproduct(t, in, in2);
freduce_degree(t);
freduce_coefficients(t);
memcpy(output, t, sizeof(limb) * 10);
}
static void fsquare_inner(limb *output, const limb *in) {
output[0] = ((limb) ((s32) in[0])) * ((s32) in[0]);
output[1] = 2 * ((limb) ((s32) in[0])) * ((s32) in[1]);
output[2] = 2 * (((limb) ((s32) in[1])) * ((s32) in[1]) +
((limb) ((s32) in[0])) * ((s32) in[2]));
output[3] = 2 * (((limb) ((s32) in[1])) * ((s32) in[2]) +
((limb) ((s32) in[0])) * ((s32) in[3]));
output[4] = ((limb) ((s32) in[2])) * ((s32) in[2]) +
4 * ((limb) ((s32) in[1])) * ((s32) in[3]) +
2 * ((limb) ((s32) in[0])) * ((s32) in[4]);
output[5] = 2 * (((limb) ((s32) in[2])) * ((s32) in[3]) +
((limb) ((s32) in[1])) * ((s32) in[4]) +
((limb) ((s32) in[0])) * ((s32) in[5]));
output[6] = 2 * (((limb) ((s32) in[3])) * ((s32) in[3]) +
((limb) ((s32) in[2])) * ((s32) in[4]) +
((limb) ((s32) in[0])) * ((s32) in[6]) +
2 * ((limb) ((s32) in[1])) * ((s32) in[5]));
output[7] = 2 * (((limb) ((s32) in[3])) * ((s32) in[4]) +
((limb) ((s32) in[2])) * ((s32) in[5]) +
((limb) ((s32) in[1])) * ((s32) in[6]) +
((limb) ((s32) in[0])) * ((s32) in[7]));
output[8] = ((limb) ((s32) in[4])) * ((s32) in[4]) +
2 * (((limb) ((s32) in[2])) * ((s32) in[6]) +
((limb) ((s32) in[0])) * ((s32) in[8]) +
2 * (((limb) ((s32) in[1])) * ((s32) in[7]) +
((limb) ((s32) in[3])) * ((s32) in[5])));
output[9] = 2 * (((limb) ((s32) in[4])) * ((s32) in[5]) +
((limb) ((s32) in[3])) * ((s32) in[6]) +
((limb) ((s32) in[2])) * ((s32) in[7]) +
((limb) ((s32) in[1])) * ((s32) in[8]) +
((limb) ((s32) in[0])) * ((s32) in[9]));
output[10] = 2 * (((limb) ((s32) in[5])) * ((s32) in[5]) +
((limb) ((s32) in[4])) * ((s32) in[6]) +
((limb) ((s32) in[2])) * ((s32) in[8]) +
2 * (((limb) ((s32) in[3])) * ((s32) in[7]) +
((limb) ((s32) in[1])) * ((s32) in[9])));
output[11] = 2 * (((limb) ((s32) in[5])) * ((s32) in[6]) +
((limb) ((s32) in[4])) * ((s32) in[7]) +
((limb) ((s32) in[3])) * ((s32) in[8]) +
((limb) ((s32) in[2])) * ((s32) in[9]));
output[12] = ((limb) ((s32) in[6])) * ((s32) in[6]) +
2 * (((limb) ((s32) in[4])) * ((s32) in[8]) +
2 * (((limb) ((s32) in[5])) * ((s32) in[7]) +
((limb) ((s32) in[3])) * ((s32) in[9])));
output[13] = 2 * (((limb) ((s32) in[6])) * ((s32) in[7]) +
((limb) ((s32) in[5])) * ((s32) in[8]) +
((limb) ((s32) in[4])) * ((s32) in[9]));
output[14] = 2 * (((limb) ((s32) in[7])) * ((s32) in[7]) +
((limb) ((s32) in[6])) * ((s32) in[8]) +
2 * ((limb) ((s32) in[5])) * ((s32) in[9]));
output[15] = 2 * (((limb) ((s32) in[7])) * ((s32) in[8]) +
((limb) ((s32) in[6])) * ((s32) in[9]));
output[16] = ((limb) ((s32) in[8])) * ((s32) in[8]) +
4 * ((limb) ((s32) in[7])) * ((s32) in[9]);
output[17] = 2 * ((limb) ((s32) in[8])) * ((s32) in[9]);
output[18] = 2 * ((limb) ((s32) in[9])) * ((s32) in[9]);
}
static void
fsquare(limb *output, const limb *in) {
limb t[19];
fsquare_inner(t, in);
freduce_degree(t);
freduce_coefficients(t);
memcpy(output, t, sizeof(limb) * 10);
}
/* Take a little-endian, 32-byte number and expand it into polynomial form */
static void
fexpand(limb *output, const u8 *input) {
#define F(n,start,shift,mask) \
output[n] = ((((limb) input[start + 0]) | \
((limb) input[start + 1]) << 8 | \
((limb) input[start + 2]) << 16 | \
((limb) input[start + 3]) << 24) >> shift) & mask;
F(0, 0, 0, 0x3ffffff);
F(1, 3, 2, 0x1ffffff);
F(2, 6, 3, 0x3ffffff);
F(3, 9, 5, 0x1ffffff);
F(4, 12, 6, 0x3ffffff);
F(5, 16, 0, 0x1ffffff);
F(6, 19, 1, 0x3ffffff);
F(7, 22, 3, 0x1ffffff);
F(8, 25, 4, 0x3ffffff);
F(9, 28, 6, 0x3ffffff);
#undef F
}
#if (-32 >> 1) != -16
#error "This code only works when >> does sign-extension on negative numbers"
#endif
/* Take a fully reduced polynomial form number and contract it into a
* little-endian, 32-byte array
*/
static void
fcontract(u8 *output, limb *input) {
int i;
int j;
for (j = 0; j < 2; ++j) {
for (i = 0; i < 9; ++i) {
if ((i & 1) == 1) {
/* This calculation is a time-invariant way to make input[i] positive
by borrowing from the next-larger limb.
*/
const s32 mask = (s32)(input[i]) >> 31;
const s32 carry = -(((s32)(input[i]) & mask) >> 25);
input[i] = (s32)(input[i]) + (carry << 25);
input[i+1] = (s32)(input[i+1]) - carry;
} else {
const s32 mask = (s32)(input[i]) >> 31;
const s32 carry = -(((s32)(input[i]) & mask) >> 26);
input[i] = (s32)(input[i]) + (carry << 26);
input[i+1] = (s32)(input[i+1]) - carry;
}
}
{
const s32 mask = (s32)(input[9]) >> 31;
const s32 carry = -(((s32)(input[9]) & mask) >> 25);
input[9] = (s32)(input[9]) + (carry << 25);
input[0] = (s32)(input[0]) - (carry * 19);
}
}
/* The first borrow-propagation pass above ended with every limb
except (possibly) input[0] non-negative.
Since each input limb except input[0] is decreased by at most 1
by a borrow-propagation pass, the second borrow-propagation pass
could only have wrapped around to decrease input[0] again if the
first pass left input[0] negative *and* input[1] through input[9]
were all zero. In that case, input[1] is now 2^25 - 1, and this
last borrow-propagation step will leave input[1] non-negative.
*/
{
const s32 mask = (s32)(input[0]) >> 31;
const s32 carry = -(((s32)(input[0]) & mask) >> 26);
input[0] = (s32)(input[0]) + (carry << 26);
input[1] = (s32)(input[1]) - carry;
}
/* Both passes through the above loop, plus the last 0-to-1 step, are
necessary: if input[9] is -1 and input[0] through input[8] are 0,
negative values will remain in the array until the end.
*/
input[1] <<= 2;
input[2] <<= 3;
input[3] <<= 5;
input[4] <<= 6;
input[6] <<= 1;
input[7] <<= 3;
input[8] <<= 4;
input[9] <<= 6;
#define F(i, s) \
output[s+0] |= input[i] & 0xff; \
output[s+1] = (input[i] >> 8) & 0xff; \
output[s+2] = (input[i] >> 16) & 0xff; \
output[s+3] = (input[i] >> 24) & 0xff;
output[0] = 0;
output[16] = 0;
F(0,0);
F(1,3);
F(2,6);
F(3,9);
F(4,12);
F(5,16);
F(6,19);
F(7,22);
F(8,25);
F(9,28);
#undef F
}
/* Input: Q, Q', Q-Q'
* Output: 2Q, Q+Q'
*
* x2 z3: long form
* x3 z3: long form
* x z: short form, destroyed
* xprime zprime: short form, destroyed
* qmqp: short form, preserved
*/
static void fmonty(limb *x2, limb *z2, /* output 2Q */
limb *x3, limb *z3, /* output Q + Q' */
limb *x, limb *z, /* input Q */
limb *xprime, limb *zprime, /* input Q' */
const limb *qmqp /* input Q - Q' */) {
limb origx[10], origxprime[10], zzz[19], xx[19], zz[19], xxprime[19],
zzprime[19], zzzprime[19], xxxprime[19];
memcpy(origx, x, 10 * sizeof(limb));
fsum(x, z);
fdifference(z, origx); // does x - z
memcpy(origxprime, xprime, sizeof(limb) * 10);
fsum(xprime, zprime);
fdifference(zprime, origxprime);
fproduct(xxprime, xprime, z);
fproduct(zzprime, x, zprime);
freduce_degree(xxprime);
freduce_coefficients(xxprime);
freduce_degree(zzprime);
freduce_coefficients(zzprime);
memcpy(origxprime, xxprime, sizeof(limb) * 10);
fsum(xxprime, zzprime);
fdifference(zzprime, origxprime);
fsquare(xxxprime, xxprime);
fsquare(zzzprime, zzprime);
fproduct(zzprime, zzzprime, qmqp);
freduce_degree(zzprime);
freduce_coefficients(zzprime);
memcpy(x3, xxxprime, sizeof(limb) * 10);
memcpy(z3, zzprime, sizeof(limb) * 10);
fsquare(xx, x);
fsquare(zz, z);
fproduct(x2, xx, zz);
freduce_degree(x2);
freduce_coefficients(x2);
fdifference(zz, xx); // does zz = xx - zz
memset(zzz + 10, 0, sizeof(limb) * 9);
fscalar_product(zzz, zz, 121665);
/* No need to call freduce_degree here:
fscalar_product doesn't increase the degree of its input. */
freduce_coefficients(zzz);
fsum(zzz, xx);
fproduct(z2, zz, zzz);
freduce_degree(z2);
freduce_coefficients(z2);
}
/* Conditionally swap two reduced-form limb arrays if 'iswap' is 1, but leave
* them unchanged if 'iswap' is 0. Runs in data-invariant time to avoid
* side-channel attacks.
*
* NOTE that this function requires that 'iswap' be 1 or 0; other values give
* wrong results. Also, the two limb arrays must be in reduced-coefficient,
* reduced-degree form: the values in a[10..19] or b[10..19] aren't swapped,
* and all all values in a[0..9],b[0..9] must have magnitude less than
* INT32_MAX.
*/
static void
swap_conditional(limb a[19], limb b[19], limb iswap) {
unsigned i;
const s32 swap = (s32) -iswap;
for (i = 0; i < 10; ++i) {
const s32 x = swap & ( ((s32)a[i]) ^ ((s32)b[i]) );
a[i] = ((s32)a[i]) ^ x;
b[i] = ((s32)b[i]) ^ x;
}
}
/* Calculates nQ where Q is the x-coordinate of a point on the curve
*
* resultx/resultz: the x coordinate of the resulting curve point (short form)
* n: a little endian, 32-byte number
* q: a point of the curve (short form)
*/
static void
cmult(limb *resultx, limb *resultz, const u8 *n, const limb *q) {
limb a[19] = {0}, b[19] = {1}, c[19] = {1}, d[19] = {0};
limb *nqpqx = a, *nqpqz = b, *nqx = c, *nqz = d, *t;
limb e[19] = {0}, f[19] = {1}, g[19] = {0}, h[19] = {1};
limb *nqpqx2 = e, *nqpqz2 = f, *nqx2 = g, *nqz2 = h;
unsigned i, j;
memcpy(nqpqx, q, sizeof(limb) * 10);
for (i = 0; i < 32; ++i) {
u8 byte = n[31 - i];
for (j = 0; j < 8; ++j) {
const limb bit = byte >> 7;
swap_conditional(nqx, nqpqx, bit);
swap_conditional(nqz, nqpqz, bit);
fmonty(nqx2, nqz2,
nqpqx2, nqpqz2,
nqx, nqz,
nqpqx, nqpqz,
q);
swap_conditional(nqx2, nqpqx2, bit);
swap_conditional(nqz2, nqpqz2, bit);
t = nqx;
nqx = nqx2;
nqx2 = t;
t = nqz;
nqz = nqz2;
nqz2 = t;
t = nqpqx;
nqpqx = nqpqx2;
nqpqx2 = t;
t = nqpqz;
nqpqz = nqpqz2;
nqpqz2 = t;
byte <<= 1;
}
}
memcpy(resultx, nqx, sizeof(limb) * 10);
memcpy(resultz, nqz, sizeof(limb) * 10);
}
// -----------------------------------------------------------------------------
// Shamelessly copied from djb's code
// -----------------------------------------------------------------------------
static void
crecip(limb *out, const limb *z) {
limb z2[10];
limb z9[10];
limb z11[10];
limb z2_5_0[10];
limb z2_10_0[10];
limb z2_20_0[10];
limb z2_50_0[10];
limb z2_100_0[10];
limb t0[10];
limb t1[10];
int i;
/* 2 */ fsquare(z2,z);
/* 4 */ fsquare(t1,z2);
/* 8 */ fsquare(t0,t1);
/* 9 */ fmul(z9,t0,z);
/* 11 */ fmul(z11,z9,z2);
/* 22 */ fsquare(t0,z11);
/* 2^5 - 2^0 = 31 */ fmul(z2_5_0,t0,z9);
/* 2^6 - 2^1 */ fsquare(t0,z2_5_0);
/* 2^7 - 2^2 */ fsquare(t1,t0);
/* 2^8 - 2^3 */ fsquare(t0,t1);
/* 2^9 - 2^4 */ fsquare(t1,t0);
/* 2^10 - 2^5 */ fsquare(t0,t1);
/* 2^10 - 2^0 */ fmul(z2_10_0,t0,z2_5_0);
/* 2^11 - 2^1 */ fsquare(t0,z2_10_0);
/* 2^12 - 2^2 */ fsquare(t1,t0);
/* 2^20 - 2^10 */ for (i = 2;i < 10;i += 2) { fsquare(t0,t1); fsquare(t1,t0); }
/* 2^20 - 2^0 */ fmul(z2_20_0,t1,z2_10_0);
/* 2^21 - 2^1 */ fsquare(t0,z2_20_0);
/* 2^22 - 2^2 */ fsquare(t1,t0);
/* 2^40 - 2^20 */ for (i = 2;i < 20;i += 2) { fsquare(t0,t1); fsquare(t1,t0); }
/* 2^40 - 2^0 */ fmul(t0,t1,z2_20_0);
/* 2^41 - 2^1 */ fsquare(t1,t0);
/* 2^42 - 2^2 */ fsquare(t0,t1);
/* 2^50 - 2^10 */ for (i = 2;i < 10;i += 2) { fsquare(t1,t0); fsquare(t0,t1); }
/* 2^50 - 2^0 */ fmul(z2_50_0,t0,z2_10_0);
/* 2^51 - 2^1 */ fsquare(t0,z2_50_0);
/* 2^52 - 2^2 */ fsquare(t1,t0);
/* 2^100 - 2^50 */ for (i = 2;i < 50;i += 2) { fsquare(t0,t1); fsquare(t1,t0); }
/* 2^100 - 2^0 */ fmul(z2_100_0,t1,z2_50_0);
/* 2^101 - 2^1 */ fsquare(t1,z2_100_0);
/* 2^102 - 2^2 */ fsquare(t0,t1);
/* 2^200 - 2^100 */ for (i = 2;i < 100;i += 2) { fsquare(t1,t0); fsquare(t0,t1); }
/* 2^200 - 2^0 */ fmul(t1,t0,z2_100_0);
/* 2^201 - 2^1 */ fsquare(t0,t1);
/* 2^202 - 2^2 */ fsquare(t1,t0);
/* 2^250 - 2^50 */ for (i = 2;i < 50;i += 2) { fsquare(t0,t1); fsquare(t1,t0); }
/* 2^250 - 2^0 */ fmul(t0,t1,z2_50_0);
/* 2^251 - 2^1 */ fsquare(t1,t0);
/* 2^252 - 2^2 */ fsquare(t0,t1);
/* 2^253 - 2^3 */ fsquare(t1,t0);
/* 2^254 - 2^4 */ fsquare(t0,t1);
/* 2^255 - 2^5 */ fsquare(t1,t0);
/* 2^255 - 21 */ fmul(out,t1,z11);
}
void curve25519_normalize(u8 *e) {
e[0] &= 248;
e[31] &= 127;
e[31] |= 64;
}
void curve25519_donna_ref(uint8_t *mypublic, const uint8_t *secret, const uint8_t *basepoint) {
limb bp[10], x[10], z[11], zmone[10];
uint8_t e[32];
int i;
for (i = 0; i < 32; ++i) e[i] = secret[i];
e[0] &= 248;
e[31] &= 127;
e[31] |= 64;
fexpand(bp, basepoint);
cmult(x, z, e, bp);
crecip(zmone, z);
fmul(z, x, zmone);
freduce_coefficients(z);
fcontract(mypublic, z);
}

17
crypto/curve25519-donna.h Normal file
View file

@ -0,0 +1,17 @@
#ifndef TUNSAFE_CRYPTO_CURVE25519_DONNA_H_
#define TUNSAFE_CRYPTO_CURVE25519_DONNA_H_
#include "tunsafe_types.h"
void curve25519_donna_ref(uint8 *mypublic, const uint8 *secret, const uint8 *basepoint);
extern "C" void curve25519_donna_x64(uint8 *mypublic, const uint8 *secret, const uint8 *basepoint);
#if defined(ARCH_CPU_X86_64) && defined(COMPILER_MSVC)
#define curve25519_donna curve25519_donna_x64
#else
#define curve25519_donna curve25519_donna_ref
#endif
void curve25519_normalize(uint8 *e);
#endif // TUNSAFE_CRYPTO_CURVE25519_DONNA_H_

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,28 @@
#!/bin/sh
set -e
# macos
perl make_chacha20_x64.pl macosx > chacha20_x64_gas_macosx.s
perl make_poly1305_x64.pl macosx > poly1305_x64_gas_macosx.s
cd aesgcm
perl aesni-gcm-x86_64.pl macosx > aesni_gcm_x64_gas_macosx.s
perl aesni-x86_64.pl macosx > aesni_x64_gas_macosx.s
perl ghash-x86_64.pl macosx > ghash_x64_gas_macosx.s
cd ..
# linux,freebsd
perl make_chacha20_x64.pl gas > chacha20_x64_gas.s
perl make_poly1305_x64.pl gas > poly1305_x64_gas.s
cd aesgcm
perl aesni-gcm-x86_64.pl gas > aesni_gcm_x64_gas.s
perl aesni-x86_64.pl gas > aesni_x64_gas.s
perl ghash-x86_64.pl gas > ghash_x64_gas.s
cd ..

3665
crypto/make_chacha20_x64.pl Normal file

File diff suppressed because it is too large Load diff

4719
crypto/make_poly1305_x64.pl Normal file

File diff suppressed because it is too large Load diff

1815
crypto/make_poly1305_x86.pl Normal file

File diff suppressed because it is too large Load diff

18
crypto/nasm.props Normal file
View file

@ -0,0 +1,18 @@
<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup
Condition="'$(NASMBeforeTargets)' == '' and '$(NASMAfterTargets)' == '' and '$(ConfigurationType)' != 'Makefile'">
<NASMBeforeTargets>Midl</NASMBeforeTargets>
<NASMAfterTargets>CustomBuild</NASMAfterTargets>
</PropertyGroup>
<ItemDefinitionGroup>
<NASM>
<OutputFormat>$(IntDir)%(FileName).obj</OutputFormat>
<PackAlignmentBoundary>0</PackAlignmentBoundary>
<CommandLineTemplate Condition="'$(Platform)' == 'Win32'">c:\dev\nasm\nasm.exe -f win32 [AllOptions] [AdditionalOptions] %(FullPath)</CommandLineTemplate>
<CommandLineTemplate Condition="'$(Platform)' == 'X64'">c:\dev\nasm\nasm.exe -f win64 [AllOptions] [AdditionalOptions] %(FullPath)</CommandLineTemplate>
<CommandLineTemplate Condition="'$(Platform)' != 'Win32' and '$(Platform)' != 'X64'">echo NASM not supported on this platform</CommandLineTemplate>
<ExecutionDescription>Assembling [Inputs]...</ExecutionDescription>
</NASM>
</ItemDefinitionGroup>
</Project>

82
crypto/nasm.targets Normal file
View file

@ -0,0 +1,82 @@
<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<PropertyPageSchema
Include="$(MSBuildThisFileDirectory)$(MSBuildThisFileName).xml" />
<AvailableItemName Include="NASM">
<Targets>_NASM</Targets>
</AvailableItemName>
</ItemGroup>
<PropertyGroup>
<ComputeLinkInputsTargets>
$(ComputeLinkInputsTargets);
ComputeNASMOutput;
</ComputeLinkInputsTargets>
<ComputeLibInputsTargets>
$(ComputeLibInputsTargets);
ComputeNASMOutput;
</ComputeLibInputsTargets>
</PropertyGroup>
<UsingTask
TaskName="NASM"
TaskFactory="XamlTaskFactory"
AssemblyName="Microsoft.Build.Tasks.v4.0">
<Task>$(MSBuildThisFileDirectory)$(MSBuildThisFileName).xml</Task>
</UsingTask>
<Target
Name="_NASM"
BeforeTargets="$(NASMBeforeTargets)"
AfterTargets="$(NASMAfterTargets)"
Condition="'@(NASM)' != ''"
Outputs="%(NASM.OutputFormat)"
Inputs="%(NASM.Identity);%(NASM.AdditionalDependencies);$(MSBuildProjectFile)"
DependsOnTargets="_SelectedFiles">
<ItemGroup Condition="'@(SelectedFiles)' != ''">
<NASM Remove="@(NASM)" Condition="'%(Identity)' != '@(SelectedFiles)'" />
</ItemGroup>
<ItemGroup>
<NASM_tlog Include="%(NASM.OutputFormat)" Condition="'%(NASM.OutputFormat)' != '' and '%(NASM.ExcludedFromBuild)' != 'true'">
<Source>@(NASM, '|')</Source>
</NASM_tlog>
</ItemGroup>
<Message
Importance="High"
Text="%(NASM.ExecutionDescription)" />
<WriteLinesToFile
Condition="'@(NASM_tlog)' != '' and '%(NASM_tlog.ExcludedFromBuild)' != 'true'"
File="$(IntDir)$(ProjectName).write.1.tlog"
Lines="^%(NASM_tlog.Source);@(NASM_tlog-&gt;'%(Fullpath)')"/>
<NASM
Condition="'@(NASM)' != '' and '%(NASM.ExcludedFromBuild)' != 'true'"
Inputs="%(NASM.Inputs)"
OutputFormat="%(NASM.OutputFormat)"
AssembledCodeListingFile="%(NASM.AssembledCodeListingFile)"
GenerateDebugInformation="%(NASM.GenerateDebugInformation)"
ErrorReporting="%(NASM.ErrorReporting)"
IncludePaths="%(NASM.IncludePaths)"
PreprocessorDefinitions="%(NASM.PreprocessorDefinitions)"
UndefinePreprocessorDefinitions="%(NASM.UndefinePreprocessorDefinitions)"
ErrorReportingFormat="%(NASM.ErrorReportingFormat)"
TreatWarningsAsErrors="%(NASM.TreatWarningsAsErrors)"
floatunderflow="%(NASM.floatunderflow)"
macrodefaults="%(NASM.macrodefaults)"
user="%(NASM.user)"
floatoverflow="%(NASM.floatoverflow)"
floatdenorm="%(NASM.floatdenorm)"
numberoverflow="%(NASM.numberoverflow)"
macroselfref="%(NASM.macroselfref)"
floattoolong="%(NASM.floattoolong)"
orphanlabels="%(NASM.orphanlabels)"
CommandLineTemplate="%(NASM.CommandLineTemplate)"
AdditionalOptions="%(NASM.AdditionalOptions)"
/>
</Target>
<Target
Name="ComputeNASMOutput"
Condition="'@(NASM)' != ''">
<ItemGroup>
<Link Include="@(NASM->Metadata('OutputFormat')->Distinct()->ClearMetadata())" Condition="'%(NASM.ExcludedFromBuild)' != 'true'"/>
<Lib Include="@(NASM->Metadata('OutputFormat')->Distinct()->ClearMetadata())" Condition="'%(NASM.ExcludedFromBuild)' != 'true'"/>
</ItemGroup>
</Target>
</Project>

308
crypto/nasm.xml Normal file
View file

@ -0,0 +1,308 @@
<?xml version="1.0" encoding="utf-8"?>
<ProjectSchemaDefinitions xmlns="http://schemas.microsoft.com/build/2009/properties" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:sys="clr-namespace:System;assembly=mscorlib">
<Rule
Name="NASM"
PageTemplate="tool"
DisplayName="Netwide Assembler"
Order="200">
<Rule.DataSource>
<DataSource
Persistence="ProjectFile"
ItemType="NASM" />
</Rule.DataSource>
<Rule.Categories>
<Category
Name="General">
<Category.DisplayName>
<sys:String>General</sys:String>
</Category.DisplayName>
</Category>
<Category
Name="Preprocessor">
<Category.DisplayName>
<sys:String>Preprocessing Options</sys:String>
</Category.DisplayName>
</Category>
<Category
Name="Assembler Options">
<Category.DisplayName>
<sys:String>Assembler Options</sys:String>
</Category.DisplayName>
</Category>
<Category
Name="Advanced">
<Category.DisplayName>
<sys:String>Advanced </sys:String>
</Category.DisplayName>
</Category>
<Category
Name="Command Line"
Subtype="CommandLine">
<Category.DisplayName>
<sys:String>Command Line</sys:String>
</Category.DisplayName>
</Category>
</Rule.Categories>
<StringProperty
Name="Inputs"
Category="Command Line"
IsRequired="true">
<StringProperty.DataSource>
<DataSource
Persistence="ProjectFile"
ItemType="NASM"
SourceType="Item" />
</StringProperty.DataSource>
</StringProperty>
<StringProperty
Name="OutputFormat"
Category="Assembler Options"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Output File Name"
Description="Specify Output Filename.-o [value]"
Switch="-o [value]"
/>
<StringListProperty
Name="AssembledCodeListingFile"
Category="Assembler Options"
DisplayName="Assembled Code Listing File"
Description="Generates an assembled code listing file. (-l [file])"
HelpUrl="http://www.nasm.us/doc/"
Switch="-l &quot;[value]&quot;"
/>
<BoolProperty
Name="GenerateDebugInformation"
Category="Assembler Options"
DisplayName="Generate Debug Information"
Description="Generates Debug Information. (-g)"
HelpUrl="http://www.nasm.us/doc/"
Switch="-g"
/>
<StringListProperty
Name="ErrorReporting"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Redirect Error Messages to File"
Description="Drops the error Message on specified device"
Switch="-Z &quot;[value]&quot;"
/>
<StringListProperty
Name="IncludePaths"
Category="General"
DisplayName="Include Paths"
Description="Sets path for include file. (-I[path])"
HelpUrl="http://www.nasm.us/doc/"
Switch="-I[value]"
/>
<StringListProperty
Name="PreprocessorDefinitions"
Category="Preprocessor"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Preprocessor Definitions"
Description="Defines a text macro with the given name. (-D[symbol])"
Switch="-D[value]"
/>
<StringListProperty
Name="UndefinePreprocessorDefinitions"
Category="Preprocessor"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Undefine Preprocessor Definitions"
Description="Undefines a text macro with the given name. (-U[symbol])"
Switch="-U[value]"
/>
<EnumProperty
Name="ErrorReportingFormat"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Error Reporting Format"
Description="Select the error reporting format ie. GNU or VC">
<EnumValue
Name="0"
DisplayName="-Xgnu GNU format: Default format"
Switch="-Xgnu" />
<EnumValue
Name="1"
DisplayName="-Xvc Style used by Microsoft Visual C++"
Switch="-Xvc" />
</EnumProperty>
<BoolProperty
Name="TreatWarningsAsErrors"
Category="Assembler Options"
DisplayName="Treat Warnings As Errors"
Description="Returns an error code if warnings are generated. (-Werror)"
HelpUrl="http://www.nasm.us/doc/"
Switch="-Werror"
/>
<BoolProperty
Name="floatunderflow"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="float-underflow"
Description="floating point underflow (default off)"
Switch="-w+float-underflow" />
<BoolProperty
Name="macrodefaults"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable macro-defaults"
Description="macros with more default than optional parameters (default on)"
Switch="-w-macro-defaults" />
<BoolProperty
Name="user"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable user"
Description="%warning directives (default on)"
Switch="-w-user" />
<BoolProperty
Name="floatoverflow"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable float-overflow"
Description="floating point overflow (default on)"
Switch="-w-float-overflow" />
<BoolProperty
Name="floatdenorm"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="float-denorm"
Description="floating point denormal (default off)"
Switch="-w+float-denorm" />
<BoolProperty
Name="numberoverflow"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable number-overflow"
Description="numeric constant does not fit (default on)"
Switch="-w-number-overflow" />
<BoolProperty
Name="macroselfref"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="macro-selfref"
Description="cyclic macro references (default off)"
Switch="-w+macro-selfref" />
<BoolProperty
Name="floattoolong"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable float-toolong"
Description=" too many digits in floating-point number (default on)"
Switch="-w-float-toolong" />
<BoolProperty
Name="orphanlabels"
Category="Advanced"
HelpUrl="http://www.nasm.us/doc/"
DisplayName="Disable orphan-labels"
Description="labels alone on lines without trailing `:' (default on)"
Switch="-w-orphan-labels" />
<StringProperty
Name="CommandLineTemplate"
DisplayName="Command Line"
Visible="False"
IncludeInCommandLine="False" />
<DynamicEnumProperty
Name="NASMBeforeTargets"
Category="General"
EnumProvider="Targets"
IncludeInCommandLine="False">
<DynamicEnumProperty.DisplayName>
<sys:String>Execute Before</sys:String>
</DynamicEnumProperty.DisplayName>
<DynamicEnumProperty.Description>
<sys:String>Specifies the targets for the build customization to run before.</sys:String>
</DynamicEnumProperty.Description>
<DynamicEnumProperty.ProviderSettings>
<NameValuePair
Name="Exclude"
Value="^NASMBeforeTargets|^Compute" />
</DynamicEnumProperty.ProviderSettings>
<DynamicEnumProperty.DataSource>
<DataSource
Persistence="ProjectFile"
ItemType=""
HasConfigurationCondition="true" />
</DynamicEnumProperty.DataSource>
</DynamicEnumProperty>
<DynamicEnumProperty
Name="NASMAfterTargets"
Category="General"
EnumProvider="Targets"
IncludeInCommandLine="False">
<DynamicEnumProperty.DisplayName>
<sys:String>Execute After</sys:String>
</DynamicEnumProperty.DisplayName>
<DynamicEnumProperty.Description>
<sys:String>Specifies the targets for the build customization to run after.</sys:String>
</DynamicEnumProperty.Description>
<DynamicEnumProperty.ProviderSettings>
<NameValuePair
Name="Exclude"
Value="^NASMAfterTargets|^Compute" />
</DynamicEnumProperty.ProviderSettings>
<DynamicEnumProperty.DataSource>
<DataSource
Persistence="ProjectFile"
ItemType=""
HasConfigurationCondition="true" />
</DynamicEnumProperty.DataSource>
</DynamicEnumProperty>
<StringProperty
Name="ExecutionDescription"
DisplayName="Execution Description"
IncludeInCommandLine="False"
Visible="False" />
<StringListProperty
Name="AdditionalDependencies"
DisplayName="Additional Dependencies"
IncludeInCommandLine="False"
Visible="False" />
<StringProperty
Subtype="AdditionalOptions"
Name="AdditionalOptions"
Category="Command Line">
<StringProperty.DisplayName>
<sys:String>Additional Options</sys:String>
</StringProperty.DisplayName>
<StringProperty.Description>
<sys:String>Additional Options</sys:String>
</StringProperty.Description>
</StringProperty>
</Rule>
<ItemType
Name="NASM"
DisplayName="Netwide Assembler" />
<FileExtension
Name="*.asm"
ContentType="NASM" />
<ContentType
Name="NASM"
DisplayName="Netwide Assembler"
ItemType="NASM" />
</ProjectSchemaDefinitions>

3132
crypto/poly1305_x64_gas.s Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

3487
crypto/poly1305_x64_nasm.asm Normal file

File diff suppressed because it is too large Load diff

193
crypto/siphash.cpp Normal file
View file

@ -0,0 +1,193 @@
/* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
*
* This file is provided under a dual BSD/GPLv2 license.
*
* SipHash: a fast short-input PRF
* https://131002.net/siphash/
*
* This implementation is specifically for SipHash2-4 for a secure PRF
* and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for
* hashtables.
*/
#include "stdafx.h"
#include "crypto/siphash.h"
#include "tunsafe_endian.h"
#define SIPROUND \
do { \
v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
} while (0)
#define PREAMBLE(len) \
uint64 v0 = 0x736f6d6570736575ULL; \
uint64 v1 = 0x646f72616e646f6dULL; \
uint64 v2 = 0x6c7967656e657261ULL; \
uint64 v3 = 0x7465646279746573ULL; \
uint64 b = ((uint64)(len)) << 56; \
v3 ^= key->key[1]; \
v2 ^= key->key[0]; \
v1 ^= key->key[1]; \
v0 ^= key->key[0];
#define POSTAMBLE \
v3 ^= b; \
SIPROUND; \
SIPROUND; \
v0 ^= b; \
v2 ^= 0xff; \
SIPROUND; \
SIPROUND; \
SIPROUND; \
SIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);
uint64 siphash(const void *data, size_t len, const siphash_key_t *key) {
const uint8 *end = (uint8*)data + len - (len % sizeof(uint64));
const uint8 left = len & (sizeof(uint64) - 1);
uint64 m;
PREAMBLE(len)
for (; data != end; data = (uint8*)data + sizeof(uint64)) {
m = ReadLE64(data);
v3 ^= m;
SIPROUND;
SIPROUND;
v0 ^= m;
}
switch (left) {
case 7: b |= ((uint64)end[6]) << 48;
case 6: b |= ((uint64)end[5]) << 40;
case 5: b |= ((uint64)end[4]) << 32;
case 4: b |= ReadLE32(data); break;
case 3: b |= ((uint64)end[2]) << 16;
case 2: b |= ReadLE16(data); break;
case 1: b |= end[0];
}
POSTAMBLE
}
/**
* siphash_1u64 - compute 64-bit siphash PRF value of a uint64
* @first: first uint64
* @key: the siphash key
*/
uint64 siphash_1u64(const uint64 first, const siphash_key_t *key)
{
PREAMBLE(8)
v3 ^= first;
SIPROUND;
SIPROUND;
v0 ^= first;
POSTAMBLE
}
/**
* siphash_2u64 - compute 64-bit siphash PRF value of 2 uint64
* @first: first uint64
* @second: second uint64
* @key: the siphash key
*/
uint64 siphash_2u64(const uint64 first, const uint64 second, const siphash_key_t *key)
{
PREAMBLE(16)
v3 ^= first;
SIPROUND;
SIPROUND;
v0 ^= first;
v3 ^= second;
SIPROUND;
SIPROUND;
v0 ^= second;
POSTAMBLE
}
/**
* siphash_3u64 - compute 64-bit siphash PRF value of 3 uint64
* @first: first uint64
* @second: second uint64
* @third: third uint64
* @key: the siphash key
*/
uint64 siphash_3u64(const uint64 first, const uint64 second, const uint64 third,
const siphash_key_t *key)
{
PREAMBLE(24)
v3 ^= first;
SIPROUND;
SIPROUND;
v0 ^= first;
v3 ^= second;
SIPROUND;
SIPROUND;
v0 ^= second;
v3 ^= third;
SIPROUND;
SIPROUND;
v0 ^= third;
POSTAMBLE
}
/**
* siphash_4u64 - compute 64-bit siphash PRF value of 4 uint64
* @first: first uint64
* @second: second uint64
* @third: third uint64
* @forth: forth uint64
* @key: the siphash key
*/
uint64 siphash_4u64(const uint64 first, const uint64 second, const uint64 third,
const uint64 forth, const siphash_key_t *key)
{
PREAMBLE(32)
v3 ^= first;
SIPROUND;
SIPROUND;
v0 ^= first;
v3 ^= second;
SIPROUND;
SIPROUND;
v0 ^= second;
v3 ^= third;
SIPROUND;
SIPROUND;
v0 ^= third;
v3 ^= forth;
SIPROUND;
SIPROUND;
v0 ^= forth;
POSTAMBLE
}
uint64 siphash_1u32(const uint32 first, const siphash_key_t *key)
{
PREAMBLE(4)
b |= first;
POSTAMBLE
}
uint64 siphash_3u32(const uint32 first, const uint32 second, const uint32 third,
const siphash_key_t *key)
{
uint64 combined = (uint64)second << 32 | first;
PREAMBLE(12)
v3 ^= combined;
SIPROUND;
SIPROUND;
v0 ^= combined;
b |= third;
POSTAMBLE
}
uint64 siphash_u64_u32(const uint64 combined, const uint32 third, const siphash_key_t *key) {
PREAMBLE(12)
v3 ^= combined;
SIPROUND;
SIPROUND;
v0 ^= combined;
b |= third;
POSTAMBLE
}

53
crypto/siphash.h Normal file
View file

@ -0,0 +1,53 @@
/* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
*
* This file is provided under a dual BSD/GPLv2 license.
*
* SipHash: a fast short-input PRF
* https://131002.net/siphash/
*
* This implementation is specifically for SipHash2-4 for a secure PRF
* and HalfSipHash1-3/SipHash1-3 for an insecure PRF only suitable for
* hashtables.
*/
#ifndef TUNSAFE_CRYPTO_SIPHASH_H_
#define TUNSAFE_CRYPTO_SIPHASH_H_
#include "tunsafe_types.h"
typedef struct {
uint64 key[2];
} siphash_key_t;
uint64 siphash_1u64(const uint64 a, const siphash_key_t *key);
uint64 siphash_2u64(const uint64 a, const uint64 b, const siphash_key_t *key);
uint64 siphash_3u64(const uint64 a, const uint64 b, const uint64 c,
const siphash_key_t *key);
uint64 siphash_4u64(const uint64 a, const uint64 b, const uint64 c, const uint64 d,
const siphash_key_t *key);
uint64 siphash_1u32(const uint32 a, const siphash_key_t *key);
uint64 siphash_3u32(const uint32 a, const uint32 b, const uint32 c,
const siphash_key_t *key);
static inline uint64 siphash_2u32(const uint32 a, const uint32 b,
const siphash_key_t *key)
{
return siphash_1u64((uint64)b << 32 | a, key);
}
static inline uint64 siphash_4u32(const uint32 a, const uint32 b, const uint32 c,
const uint32 d, const siphash_key_t *key)
{
return siphash_2u64((uint64)b << 32 | a, (uint64)d << 32 | c, key);
}
uint64 siphash_u64_u32(const uint64 combined, const uint32 third, const siphash_key_t *key);
/**
* siphash - compute 64-bit siphash PRF value
* @data: buffer to hash
* @size: size of @data
* @key: the siphash key
*/
uint64 siphash(const void *data, size_t len, const siphash_key_t *key);
#endif // TUNSAFE_CRYPTO_SIPHASH_H_

1433
crypto/x86_64-xlate.pl Normal file

File diff suppressed because it is too large Load diff

41
crypto_ops.h Normal file
View file

@ -0,0 +1,41 @@
// SPDX-License-Identifier: AGPL-1.0-only
// Copyright (C) 2018 Ludvig Strigeus <info@tunsafe.com>. All Rights Reserved.
#ifndef TUNSAFE_CRYPTO_OPS_H_
#define TUNSAFE_CRYPTO_OPS_H_
#include "build_config.h"
#include "tunsafe_types.h"
#include <string.h>
#if defined(COMPILER_MSVC)
#include <intrin.h>
#endif // defined(COMPILER_MSVC)
#if defined(ARCH_CPU_X86_64) && defined(COMPILER_MSVC)
FORCEINLINE static void memzero_crypto(void *dst, size_t n) {
if (n & 7) {
__stosb((unsigned char*)dst, 0, n);
} else {
__stosq((uint64*)dst, 0, n >> 3);
}
}
#elif defined(ARCH_CPU_X86) && defined(COMPILER_MSVC)
FORCEINLINE static void memzero_crypto(void *dst, size_t n) {
if (n & 3) {
__stosb((unsigned char*)dst, 0, n);
} else {
__stosd((unsigned long*)dst, 0, n >> 2);
}
}
#else
FORCEINLINE static void memzero_crypto(void *dst, size_t n) {
memset(dst, 0, n);
__asm__ __volatile__("": :"r"(dst) :"memory");
}
#endif
int memcmp_crypto(const uint8 *a, const uint8 *b, size_t n);
#endif // TUNSAFE_CRYPTO_OPS_H_

BIN
icons/green-bg-icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
icons/green-bg-icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

BIN
icons/green-icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

BIN
icons/green-icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

BIN
icons/neutral-icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
icons/neutral-icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

BIN
icons/red-icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

BIN
icons/red-icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

4
installer/.gitignore vendored Normal file
View file

@ -0,0 +1,4 @@
/tunsafe*.exe
/x64/
/x86/
*.pyc

48
installer/ChangeLog.txt Normal file
View file

@ -0,0 +1,48 @@
2018-06-20 - TunSafe v1.3-rc3
Changes:
1.Add option to block Internet traffic outside of TunSafe. Either
based on firewall rules, or by adding a null route, or both.
The firewall rule blocks all traffic except traffic from TunSafe,
loopback traffic, and DHCP traffic on the default NIC.
The route rule adds two /1 routes to 0.0.0.0.
2.Convert LF to CRLF when importing config files
3.Update some logging messages
4.Delete the old routing rule pointing at the VPN server IP when
disconnecting
5.Delete any conflicting old routing rule pointing at the VPN server
when connecting.
6.Tray popup menu did not disappear when clicking outside of it.
7.Show config file names also in tray popup menu.
8.Make the menu item bold if connection is selected in popup menu.
9.Don't show the .conf filename extension in the UI.
10.Show also config file name when hovering on tray icon.
11.Click on the connected server to toggle connection
12.Fix bug where internet blocking checkbox was not removed.
13.Change so bold is used for selected server, and checkbox
is used when connected.
14.Use WS_EX_COMPOSITED to reduce flicker
15.Now possible to enter a filename on command line to connect to.
16.Support /minimize and /minimize_on_connect command line opts.
17.Support PreUp,PostUp,PreDown,PostDown options on [Interface]
Note: For security reasons you need to first enable them,
so either Shift-Click on Options and select Allow Pre/Post Commands
or specify the /allow_pre_post command line option.
2018-04-29 - TunSafe v1.2
Changes:
1.Use /24 instead of failing when a /32 Address is used
2.Use /120 instead of failing when a /128 Address is used
3.Add routes for all entries in AllowedIPs
2018-04-29 - TunSafe v1.1
Changes:
1.Retry on failed DNS lookup. Helps when resuming from sleep.
2.Display a better message if the TAP adapter can't be found.
3.Retry connect when getting ERROR_FILE_NOT_FOUND.
2018-03-06 - TunSafe v1.0
First public release.

240
installer/LICENSE.TXT Normal file
View file

@ -0,0 +1,240 @@
TunSafe © 2018 Ludvig Strigeus
==============================
BY USING THE SOFTWARE, YOU ACCEPT THESE TERMS. IF YOU DO NOT ACCEPT
THEM, DO NOT USE THE SOFTWARE.
This software is provided "as is", without warranty of any kind,
express or implied, including but not limited to the warranties of
merchantability, fitness for a particular purpose and noninfringement.
In no event shall the authors or copyright holders be liable for any
claim, damages or other liability, whether in an action of contract,
tort or otherwise, arising from, out of or in connection with the
Software or the use or other dealings in the Software.
We may not provide support services for this software in the future.
You may install and use any number of copies of the software on your
devices.
Please be aware that, similar to other networking tools that capture
network packets, the information processed by TunSafe or your VPN
provider may include personally identifiable or other sensitive
information (such as usernames, passwords, addresses of web sites
accessed). By using this software, you acknowledge that you are aware of
this and take sole responsibility for any personally identifiable or
other sensitive information provided to TunSafe or your VPN provider
through your use of the software.
The software is licensed, not sold. This agreement only gives you some
rights to use the software. Unless applicable law gives you more rights
despite this limitation, you may use the software only as expressly
permitted in this agreement. In doing so, you must comply with any
technical limitations in the software that only allow you to use it in
certain ways. You may not
* work around any technical limitations in the software;
* reverse engineer, decompile or disassemble the software, except
and only to the extent that applicable law expressly permits,
despite this limitation;
* publish the software for others to copy;
* sell, rent, lease or lend the software;
* transfer the software or this agreement to any third party; or
* use the software for commercial software hosting services.
All exceptions require prior written consent from info@tunsafe.com.
You can recover from us and our suppliers only direct damages up to
U.S. $0.10. You cannot recover any other damages, including consequential,
lost profits, special, indirect or incidental damages.
This limitation applies to
* anything related to the software, services, content (including code)
on third party Internet sites, or third party programs; and
* claims for breach of contract, breach of warranty, guarantee or
condition, strict liability, negligence, or other tort to the extent
permitted by applicable law.
It also applies even if we knew or should have known about the possibility
of the damages.
This agreement describes certain legal rights. You may have other rights
under the laws of your country. You may also have rights with respect to the
party from whom you acquired the software. This agreement does not change
your rights under the laws of your country if the laws of your country do
not permit it to do so.
This agreement is the entire agreement and is governed by the laws of Sweden.
Several pieces of Open Source software were used in this product.
Here are their licenses.
BLAKE2 License
--------------
Copyright 2012, Samuel Neves <sneves@dei.uc.pt>. You may use this under the
terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
your option. The terms of these licenses can be found at:
- CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
- OpenSSL license : https://www.openssl.org/source/license.html
- Apache 2.0 : http://www.apache.org/licenses/LICENSE-2.0
More information about the BLAKE2 hash function can be found at
https://blake2.net.
Curve25519-Donna License
------------------------
Copyright 2008, Google Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
OpenSSL License
---------------
====================================================================
Copyright (c) 1998-2018 The OpenSSL Project. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
3. All advertising materials mentioning features or use of this
software must display the following acknowledgment:
"This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
endorse or promote products derived from this software without
prior written permission. For written permission, please contact
openssl-core@openssl.org.
5. Products derived from this software may not be called "OpenSSL"
nor may "OpenSSL" appear in their names without prior written
permission of the OpenSSL Project.
6. Redistributions of any form whatsoever must retain the following
acknowledgment:
"This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/)"
THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR
ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
====================================================================
This product includes cryptographic software written by Eric Young
(eay@cryptsoft.com). This product includes software written by Tim
Hudson (tjh@cryptsoft.com).
Original SSLeay License
-----------------------
Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com)
All rights reserved.
This package is an SSL implementation written
by Eric Young (eay@cryptsoft.com).
The implementation was written so as to conform with Netscapes SSL.
This library is free for commercial and non-commercial use as long as
the following conditions are aheared to. The following conditions
apply to all code found in this distribution, be it the RC4, RSA,
lhash, DES, etc., code; not just the SSL code. The SSL documentation
included with this distribution is covered by the same copyright terms
except that the holder is Tim Hudson (tjh@cryptsoft.com).
Copyright remains Eric Young's, and as such any Copyright notices in
the code are not to be removed.
If this package is used in a product, Eric Young should be given attribution
as the author of the parts of the library used.
This can be in the form of a textual message at program startup or
in documentation (online or textual) provided with the package.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. All advertising materials mentioning features or use of this software
must display the following acknowledgement:
"This product includes cryptographic software written by
Eric Young (eay@cryptsoft.com)"
The word 'cryptographic' can be left out if the rouines from the library
being used are not cryptographic related :-).
4. If you include any Windows specific code (or a derivative thereof) from
the apps directory (application code) you must include an acknowledgement:
"This product includes software written by Tim Hudson (tjh@cryptsoft.com)"
THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
The licence and distribution terms for any publically available version or
derivative of this code cannot be changed. i.e. this code cannot simply be
copied and put under another distribution licence
[including the GNU Public Licence.]

46
installer/TunSafe.conf Normal file
View file

@ -0,0 +1,46 @@
# This is a sample config file for TunSafe. It uses the same syntax as
# WireGuard's wg-quick tool
[Interface]
# The private key of this computer. This is a secret key, don't give it out.
# To convert it to a public key you can go to 'Generate Key Pair' in TunSafe.
PrivateKey = gIIBl0OHb3wZjYGqZtgzRml3wec0e5vqXtSvCTfa42w=
# Whether we want to bind a port to allow others to initiate connections to us.
# Please ensure this port is mapped in your router.
# ListenPort = 51820
# Switch DNS server while connected
# DNS = 8.8.8.8
# The addresses to bind to. Either IPv4 or IPv6. /31 and /32 are not supported.
Address = 192.168.2.2/24
# Whether to block all access to Internet that doesn't go through tunsafe.
# Note that Internet will keep being blocked even after TunSafe is restarted.
# Possible values (comma separated):
# route - Blocks all traffic using null route entries
# firewall - Blocks all traffic except TunSafe through the Windows firewall
# on - Uses the default block mechanism
# off - Turns off blocking
# BlockInternet = route, firewall
[Peer]
# The public key of the peer. Do not use the private key here. Use the 'Generate Key Pair'
# function in TunSafe to convert a private key to a public key.
PublicKey = hIA3ikjlSOAo0qqrI+rXaS3ZH04Yx7Q2YQ4m2Syz+XE=
# It's also possible to use a preshared key for extra security
# PresharedKey = SNz4BYc61amtDhzxNCxgYgdV9rPU+WiC8woX47Xf/2Y=
# The IP range that we may send packets to for this peer.
AllowedIPs = 192.168.2.0/24
# Address of the server
Endpoint = 192.168.1.4:8040
# Send periodic keepalives to ensure connection stays up behind NAT.
PersistentKeepalive = 25

BIN
installer/icon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
installer/signplugin.dll Normal file

Binary file not shown.

3
installer/signplugin/.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
/Debug/
/Release/
/.vs/

Binary file not shown.

View file

@ -0,0 +1,104 @@
import hashlib
b = 256
q = 2**255 - 19
l = 2**252 + 27742317777372353535851937790883648493
def H(m):
return hashlib.sha512(m).digest()
def expmod(b,e,m):
if e == 0: return 1
t = expmod(b,e/2,m)**2 % m
if e & 1: t = (t*b) % m
return t
def inv(x):
return expmod(x,q-2,q)
d = -121665 * inv(121666)
I = expmod(2,(q-1)/4,q)
def xrecover(y):
xx = (y*y-1) * inv(d*y*y+1)
x = expmod(xx,(q+3)/8,q)
if (x*x - xx) % q != 0: x = (x*I) % q
if x % 2 != 0: x = q-x
return x
By = 4 * inv(5)
Bx = xrecover(By)
B = [Bx % q,By % q]
def edwards(P,Q):
x1 = P[0]
y1 = P[1]
x2 = Q[0]
y2 = Q[1]
x3 = (x1*y2+x2*y1) * inv(1+d*x1*x2*y1*y2)
y3 = (y1*y2+x1*x2) * inv(1-d*x1*x2*y1*y2)
return [x3 % q,y3 % q]
def scalarmult(P,e):
if e == 0: return [0,1]
Q = scalarmult(P,e/2)
Q = edwards(Q,Q)
if e & 1: Q = edwards(Q,P)
return Q
def encodeint(y):
bits = [(y >> i) & 1 for i in range(b)]
return ''.join([chr(sum([bits[i * 8 + j] << j for j in range(8)])) for i in range(b/8)])
def encodepoint(P):
x = P[0]
y = P[1]
bits = [(y >> i) & 1 for i in range(b - 1)] + [x & 1]
return ''.join([chr(sum([bits[i * 8 + j] << j for j in range(8)])) for i in range(b/8)])
def bit(h,i):
return (ord(h[i/8]) >> (i%8)) & 1
def publickey(sk):
h = H(sk)
a = 2**(b-2) + sum(2**i * bit(h,i) for i in range(3,b-2))
A = scalarmult(B,a)
return encodepoint(A)
def Hint(m):
h = H(m)
return sum(2**i * bit(h,i) for i in range(2*b))
def signature(m,sk,pk):
h = H(sk)
a = 2**(b-2) + sum(2**i * bit(h,i) for i in range(3,b-2))
r = Hint(''.join([h[i] for i in range(b/8,b/4)]) + m)
R = scalarmult(B,r)
S = (r + Hint(encodepoint(R) + pk + m) * a) % l
return encodepoint(R) + encodeint(S)
def isoncurve(P):
x = P[0]
y = P[1]
return (-x*x + y*y - 1 - d*x*x*y*y) % q == 0
def decodeint(s):
return sum(2**i * bit(s,i) for i in range(0,b))
def decodepoint(s):
y = sum(2**i * bit(s,i) for i in range(0,b-1))
x = xrecover(y)
if x & 1 != bit(s,b-1): x = q-x
P = [x,y]
if not isoncurve(P): raise Exception("decoding point that is not on curve")
return P
def checkvalid(s,m,pk):
if len(s) != b/4: raise Exception("signature length is wrong")
if len(pk) != b/8: raise Exception("public-key length is wrong")
R = decodepoint(s[0:b/8])
A = decodepoint(pk)
S = decodeint(s[b/8:b/4])
h = Hint(encodepoint(R) + pk + m)
if scalarmult(B,S) != edwards(R,scalarmult(A,h)):
raise Exception("signature does not pass verification")

View file

@ -0,0 +1,22 @@
import hashlib
def H(m):
return hashlib.sha512(m).digest()
import ed25519
import os
sk = "".join(chr(c) for c in [4, 213, 116, 80, 117, 4, 70, 166, 244, 214, 234, 159, 197, 101, 182, 177, 106, 180, 68, 125, 51, 32, 159, 77, 27, 151, 233, 91, 109, 184, 147, 235])
pk = "".join(chr(c) for c in [79, 236, 107, 197, 85, 239, 235, 109, 123, 181, 230, 115, 206, 112, 218, 80, 174, 167, 119, 187, 113, 153, 17, 115, 77, 100, 154, 84, 181, 194, 254, 99])
hash = H(file('../tap/TunSafe-TAP-9.21.2.exe', 'rb').read())
print hash.encode('hex'), repr(hash)
#sk = os.urandom(32)
#pk = ed25519.publickey(sk)
#print 'sk', [ord(c) for c in sk]
#print 'pk', [ord(c) for c in pk]
#m = 'test'
s = ed25519.signature(hash,sk,pk)
file('../tap/TunSafe-TAP-9.21.2.exe.sig', 'wb').write(s.encode('hex'))

View file

@ -0,0 +1,121 @@
#include <Windows.h>
extern "C" {
#include "tiny/edsign.h"
#include "nsis/pluginapi.h"
#include "tiny/sha512.h"
}
// To work with Unicode version of NSIS, please use TCHAR-type
// functions for accessing the variables and the stack.
unsigned char buffer[4096];
// sk[4, 213, 116, 80, 117, 4, 70, 166, 244, 214, 234, 159, 197, 101, 182, 177, 106, 180, 68, 125, 51, 32, 159, 77, 27, 151, 233, 91, 109, 184, 147, 235]
// pk[79, 236, 107, 197, 85, 239, 235, 109, 123, 181, 230, 115, 206, 112, 218, 80, 174, 167, 119, 187, 113, 153, 17, 115, 77, 100, 154, 84, 181, 194, 254, 99]
static const unsigned char pk[32] = {79, 236, 107, 197, 85, 239, 235, 109, 123, 181, 230, 115, 206, 112, 218, 80, 174, 167, 119, 187, 113, 153, 17, 115, 77, 100, 154, 84, 181, 194, 254, 99};
int CheckFile(char *file) {
sha512_state ctx;
int ret;
HANDLE h;
unsigned char out[64];
unsigned char signature[64];
h = CreateFileA(file, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (h == INVALID_HANDLE_VALUE)
return 1;
DWORD n;
sha512_init(&ctx);
size_t total_size = 0;
size_t p = 0;
while (ReadFile(h, buffer, sizeof(buffer), &n, NULL) && n) {
total_size += n;
p = 0;
while (p + 128 <= n) {
sha512_block(&ctx, buffer + p);
p += 128;
}
if (p != n)
break;
}
sha512_final(&ctx, buffer + p, total_size);
sha512_get(&ctx, out, 0, 64);
CloseHandle(h);
/*
for (size_t i = 0; i < 64; i++) {
buffer[i * 2 + 0] = "0123456789abcdef"[out[i] >> 4];
buffer[i * 2 + 1] = "0123456789abcdef"[out[i] & 0xF];
}
buffer[128] = 0;
MessageBoxA(0, (char*)buffer, "sha", 0);
*/
char *x = file;
while (*x)x++;
memcpy(x, ".sig", 5);
h = CreateFileA(file, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (h == INVALID_HANDLE_VALUE)
return 2;
n = 0;
ReadFile(h, buffer, sizeof(buffer), &n, NULL);
CloseHandle(h);
if (n < 128)
return 3;
memset(signature, 0, sizeof(signature));
for (int i = 0; i < 128; i++) {
unsigned char c = buffer[i];
if (c >= '0' && c <= '9')
c -= '0';
else if ((c |= 32), c >= 'a' && c <= 'f')
c -= 'a' - 10;
else
return 4;
signature[i >> 1] = (signature[i >> 1] << 4) + c;
}
/* create a random seed, and a keypair out of that seed */
//ed25519_create_seed(seed);
//ed25519_create_keypair(public_key, private_key, seed);
/* create signature on the message with the keypair */
//ed25519_sign(signature, message, message_len, public_key, private_key);
/* verify the signature */
return edsign_verify(signature, pk, out, sizeof(out)) ? 0 : 5;
}
extern "C" void __declspec(dllexport) myFunction(HWND hwndParent, int string_size,
LPTSTR variables, stack_t **stacktop,
extra_parameters *extra, ...) {
EXDLL_INIT();
int rv = 10;
// note if you want parameters from the stack, pop them off in order.
// i.e. if you are called via exdll::myFunction file.dat read.txt
// calling popstring() the first time would give you file.dat,
// and the second time would give you read.txt.
// you should empty the stack of your parameters, and ONLY your
// parameters.
// do your stuff here
{
LPTSTR msgbuf = (LPTSTR)GlobalAlloc(GPTR, (string_size + 1 + 10) * sizeof(*msgbuf));
if (msgbuf) {
if (!popstring(msgbuf)) {
rv = CheckFile(msgbuf);
}
GlobalFree(msgbuf);
}
}
pushint(rv);
}
BOOL WINAPI DllMain(HINSTANCE hInst, ULONG ul_reason_for_call, LPVOID lpReserved) {
return TRUE;
}

View file

@ -0,0 +1,85 @@
/*
* apih
*
* This file is a part of NSIS.
*
* Copyright (C) 1999-2018 Nullsoft and Contributors
*
* Licensed under the zlib/libpng license (the "License");
* you may not use this file except in compliance with the License.
*
* Licence details can be found in the file COPYING.
*
* This software is provided 'as-is', without any express or implied
* warranty.
*/
#ifndef _NSIS_EXEHEAD_API_H_
#define _NSIS_EXEHEAD_API_H_
// Starting with NSIS 2.42, you can check the version of the plugin API in exec_flags->plugin_api_version
// The format is 0xXXXXYYYY where X is the major version and Y is the minor version (MAKELONG(y,x))
// When doing version checks, always remember to use >=, ex: if (pX->exec_flags->plugin_api_version >= NSISPIAPIVER_1_0) {}
#define NSISPIAPIVER_1_0 0x00010000
#define NSISPIAPIVER_CURR NSISPIAPIVER_1_0
// NSIS Plug-In Callback Messages
enum NSPIM
{
NSPIM_UNLOAD, // This is the last message a plugin gets, do final cleanup
NSPIM_GUIUNLOAD, // Called after .onGUIEnd
};
// Prototype for callbacks registered with extra_parameters->RegisterPluginCallback()
// Return NULL for unknown messages
// Should always be __cdecl for future expansion possibilities
typedef UINT_PTR (*NSISPLUGINCALLBACK)(enum NSPIM);
// extra_parameters data structure containing other interesting stuff
// besides the stack, variables and HWND passed on to plug-ins.
typedef struct
{
int autoclose; // SetAutoClose
int all_user_var; // SetShellVarContext: User context = 0, Machine context = 1
int exec_error; // IfErrors
int abort; // IfAbort
int exec_reboot; // IfRebootFlag (NSIS_SUPPORT_REBOOT)
int reboot_called; // NSIS_SUPPORT_REBOOT
int XXX_cur_insttype; // Deprecated
int plugin_api_version; // Plug-in ABI. See NSISPIAPIVER_CURR (Note: used to be XXX_insttype_changed)
int silent; // IfSilent (NSIS_CONFIG_SILENT_SUPPORT)
int instdir_error; // GetInstDirError
int rtl; // 1 if $LANGUAGE is a RTL language
int errlvl; // SetErrorLevel
int alter_reg_view; // SetRegView: Default View = 0, Alternative View = (sizeof(void*) > 4 ? KEY_WOW64_32KEY : KEY_WOW64_64KEY)
int status_update; // SetDetailsPrint
} exec_flags_t;
#ifndef NSISCALL
# define NSISCALL __stdcall
#endif
#if !defined(_WIN32) && !defined(LPTSTR)
# define LPTSTR TCHAR*
#endif
typedef struct {
exec_flags_t *exec_flags;
int (NSISCALL *ExecuteCodeSegment)(int, HWND);
void (NSISCALL *validate_filename)(LPTSTR);
int (NSISCALL *RegisterPluginCallback)(HMODULE, NSISPLUGINCALLBACK); // returns 0 on success, 1 if already registered and < 0 on errors
} extra_parameters;
// Definitions for page showing plug-ins
// See Ui.c to understand better how they're used
// sent to the outer window to tell it to go to the next inner window
#define WM_NOTIFY_OUTER_NEXT (WM_USER+0x8)
// custom pages should send this message to let NSIS know they're ready
#define WM_NOTIFY_CUSTOM_READY (WM_USER+0xd)
// sent as wParam with WM_NOTIFY_OUTER_NEXT when user cancels - heed its warning
#define NOTIFY_BYE_BYE 'x'
#endif /* _NSIS_EXEHEAD_API_H_ */

View file

@ -0,0 +1,229 @@
/*
* nsis_tchar.h
*
* This file is a part of NSIS.
*
* Copyright (C) 1999-2018 Nullsoft and Contributors
*
* This software is provided 'as-is', without any express or implied
* warranty.
*
* For Unicode support by Jim Park -- 08/30/2007
*/
// Jim Park: Only those we use are listed here.
#pragma once
#ifdef _UNICODE
#ifndef _T
#define __T(x) L ## x
#define _T(x) __T(x)
#define _TEXT(x) __T(x)
#endif
#ifndef _TCHAR_DEFINED
#define _TCHAR_DEFINED
#if !defined(_NATIVE_WCHAR_T_DEFINED) && !defined(_WCHAR_T_DEFINED)
typedef unsigned short TCHAR;
#else
typedef wchar_t TCHAR;
#endif
#endif
// program
#define _tenviron _wenviron
#define __targv __wargv
// printfs
#define _ftprintf fwprintf
#define _sntprintf _snwprintf
#if (defined(_MSC_VER) && (_MSC_VER<=1310||_MSC_FULL_VER<=140040310)) || defined(__MINGW32__)
# define _stprintf swprintf
#else
# define _stprintf _swprintf
#endif
#define _tprintf wprintf
#define _vftprintf vfwprintf
#define _vsntprintf _vsnwprintf
#if defined(_MSC_VER) && (_MSC_VER<=1310)
# define _vstprintf vswprintf
#else
# define _vstprintf _vswprintf
#endif
// scanfs
#define _tscanf wscanf
#define _stscanf swscanf
// string manipulations
#define _tcscat wcscat
#define _tcschr wcschr
#define _tcsclen wcslen
#define _tcscpy wcscpy
#define _tcsdup _wcsdup
#define _tcslen wcslen
#define _tcsnccpy wcsncpy
#define _tcsncpy wcsncpy
#define _tcsrchr wcsrchr
#define _tcsstr wcsstr
#define _tcstok wcstok
// string comparisons
#define _tcscmp wcscmp
#define _tcsicmp _wcsicmp
#define _tcsncicmp _wcsnicmp
#define _tcsncmp wcsncmp
#define _tcsnicmp _wcsnicmp
// upper / lower
#define _tcslwr _wcslwr
#define _tcsupr _wcsupr
#define _totlower towlower
#define _totupper towupper
// conversions to numbers
#define _tcstoi64 _wcstoi64
#define _tcstol wcstol
#define _tcstoul wcstoul
#define _tstof _wtof
#define _tstoi _wtoi
#define _tstoi64 _wtoi64
#define _ttoi _wtoi
#define _ttoi64 _wtoi64
#define _ttol _wtol
// conversion from numbers to strings
#define _itot _itow
#define _ltot _ltow
#define _i64tot _i64tow
#define _ui64tot _ui64tow
// file manipulations
#define _tfopen _wfopen
#define _topen _wopen
#define _tremove _wremove
#define _tunlink _wunlink
// reading and writing to i/o
#define _fgettc fgetwc
#define _fgetts fgetws
#define _fputts fputws
#define _gettchar getwchar
// directory
#define _tchdir _wchdir
// environment
#define _tgetenv _wgetenv
#define _tsystem _wsystem
// time
#define _tcsftime wcsftime
#else // ANSI
#ifndef _T
#define _T(x) x
#define _TEXT(x) x
#endif
#ifndef _TCHAR_DEFINED
#define _TCHAR_DEFINED
typedef char TCHAR;
#endif
// program
#define _tenviron environ
#define __targv __argv
// printfs
#define _ftprintf fprintf
#define _sntprintf _snprintf
#define _stprintf sprintf
#define _tprintf printf
#define _vftprintf vfprintf
#define _vsntprintf _vsnprintf
#define _vstprintf vsprintf
// scanfs
#define _tscanf scanf
#define _stscanf sscanf
// string manipulations
#define _tcscat strcat
#define _tcschr strchr
#define _tcsclen strlen
#define _tcscnlen strnlen
#define _tcscpy strcpy
#define _tcsdup _strdup
#define _tcslen strlen
#define _tcsnccpy strncpy
#define _tcsrchr strrchr
#define _tcsstr strstr
#define _tcstok strtok
// string comparisons
#define _tcscmp strcmp
#define _tcsicmp _stricmp
#define _tcsncmp strncmp
#define _tcsncicmp _strnicmp
#define _tcsnicmp _strnicmp
// upper / lower
#define _tcslwr _strlwr
#define _tcsupr _strupr
#define _totupper toupper
#define _totlower tolower
// conversions to numbers
#define _tcstol strtol
#define _tcstoul strtoul
#define _tstof atof
#define _tstoi atoi
#define _tstoi64 _atoi64
#define _tstoi64 _atoi64
#define _ttoi atoi
#define _ttoi64 _atoi64
#define _ttol atol
// conversion from numbers to strings
#define _i64tot _i64toa
#define _itot _itoa
#define _ltot _ltoa
#define _ui64tot _ui64toa
// file manipulations
#define _tfopen fopen
#define _topen _open
#define _tremove remove
#define _tunlink _unlink
// reading and writing to i/o
#define _fgettc fgetc
#define _fgetts fgets
#define _fputts fputs
#define _gettchar getchar
// directory
#define _tchdir _chdir
// environment
#define _tgetenv getenv
#define _tsystem system
// time
#define _tcsftime strftime
#endif
// is functions (the same in Unicode / ANSI)
#define _istgraph isgraph
#define _istascii __isascii
#define __TFILE__ _T(__FILE__)
#define __TDATE__ _T(__DATE__)
#define __TTIME__ _T(__TIME__)

Binary file not shown.

Binary file not shown.

View file

@ -0,0 +1,108 @@
#ifndef ___NSIS_PLUGIN__H___
#define ___NSIS_PLUGIN__H___
#ifdef __cplusplus
extern "C" {
#endif
#include "api.h"
#include "nsis_tchar.h" // BUGBUG: Why cannot our plugins use the compilers tchar.h?
#ifndef NSISCALL
# define NSISCALL WINAPI
#endif
#define EXDLL_INIT() { \
g_stringsize=string_size; \
g_stacktop=stacktop; \
g_variables=variables; }
typedef struct _stack_t {
struct _stack_t *next;
#ifdef UNICODE
WCHAR text[1]; // this should be the length of g_stringsize when allocating
#else
char text[1];
#endif
} stack_t;
enum
{
INST_0, // $0
INST_1, // $1
INST_2, // $2
INST_3, // $3
INST_4, // $4
INST_5, // $5
INST_6, // $6
INST_7, // $7
INST_8, // $8
INST_9, // $9
INST_R0, // $R0
INST_R1, // $R1
INST_R2, // $R2
INST_R3, // $R3
INST_R4, // $R4
INST_R5, // $R5
INST_R6, // $R6
INST_R7, // $R7
INST_R8, // $R8
INST_R9, // $R9
INST_CMDLINE, // $CMDLINE
INST_INSTDIR, // $INSTDIR
INST_OUTDIR, // $OUTDIR
INST_EXEDIR, // $EXEDIR
INST_LANG, // $LANGUAGE
__INST_LAST
};
extern unsigned int g_stringsize;
extern stack_t **g_stacktop;
extern LPTSTR g_variables;
void NSISCALL pushstring(LPCTSTR str);
void NSISCALL pushintptr(INT_PTR value);
#define pushint(v) pushintptr((INT_PTR)(v))
int NSISCALL popstring(LPTSTR str); // 0 on success, 1 on empty stack
int NSISCALL popstringn(LPTSTR str, int maxlen); // with length limit, pass 0 for g_stringsize
INT_PTR NSISCALL popintptr();
#define popint() ( (int) popintptr() )
int NSISCALL popint_or(); // with support for or'ing (2|4|8)
INT_PTR NSISCALL nsishelper_str_to_ptr(LPCTSTR s);
#define myatoi(s) ( (int) nsishelper_str_to_ptr(s) ) // converts a string to an integer
unsigned int NSISCALL myatou(LPCTSTR s); // converts a string to an unsigned integer, decimal only
int NSISCALL myatoi_or(LPCTSTR s); // with support for or'ing (2|4|8)
LPTSTR NSISCALL getuservariable(const int varnum);
void NSISCALL setuservariable(const int varnum, LPCTSTR var);
#ifdef UNICODE
#define PopStringW(x) popstring(x)
#define PushStringW(x) pushstring(x)
#define SetUserVariableW(x,y) setuservariable(x,y)
int NSISCALL PopStringA(LPSTR ansiStr);
void NSISCALL PushStringA(LPCSTR ansiStr);
void NSISCALL GetUserVariableW(const int varnum, LPWSTR wideStr);
void NSISCALL GetUserVariableA(const int varnum, LPSTR ansiStr);
void NSISCALL SetUserVariableA(const int varnum, LPCSTR ansiStr);
#else
// ANSI defs
#define PopStringA(x) popstring(x)
#define PushStringA(x) pushstring(x)
#define SetUserVariableA(x,y) setuservariable(x,y)
int NSISCALL PopStringW(LPWSTR wideStr);
void NSISCALL PushStringW(LPWSTR wideStr);
void NSISCALL GetUserVariableW(const int varnum, LPWSTR wideStr);
void NSISCALL GetUserVariableA(const int varnum, LPSTR ansiStr);
void NSISCALL SetUserVariableW(const int varnum, LPCWSTR wideStr);
#endif
#ifdef __cplusplus
}
#endif
#endif//!___NSIS_PLUGIN__H___

View file

@ -0,0 +1,28 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.26403.7
MinimumVisualStudioVersion = 10.0.40219.1
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "signplugin", "signplugin.vcxproj", "{C6E4A1D7-ECBC-466E-9183-30727EF81533}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|x64 = Debug|x64
Debug|x86 = Debug|x86
Release|x64 = Release|x64
Release|x86 = Release|x86
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Debug|x64.ActiveCfg = Debug|x64
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Debug|x64.Build.0 = Debug|x64
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Debug|x86.ActiveCfg = Debug|Win32
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Debug|x86.Build.0 = Debug|Win32
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Release|x64.ActiveCfg = Release|x64
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Release|x64.Build.0 = Release|x64
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Release|x86.ActiveCfg = Release|Win32
{C6E4A1D7-ECBC-466E-9183-30727EF81533}.Release|x86.Build.0 = Release|Win32
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal

View file

@ -0,0 +1,166 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{C6E4A1D7-ECBC-466E-9183-30727EF81533}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<WindowsTargetPlatformVersion>10.0.15063.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>DynamicLibrary</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>DynamicLibrary</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<WholeProgramOptimization>false</WholeProgramOptimization>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="Shared">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>true</LinkIncremental>
<GenerateManifest>false</GenerateManifest>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PreprocessorDefinitions>WIN32;_DEBUG;_WINDOWS;_USRDLL;SIGNPLUGIN_EXPORTS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
<WarningLevel>Level3</WarningLevel>
<DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
<Optimization>Disabled</Optimization>
</ClCompile>
<Link>
<TargetMachine>MachineX86</TargetMachine>
<GenerateDebugInformation>true</GenerateDebugInformation>
<SubSystem>Windows</SubSystem>
<EntryPointSymbol>
</EntryPointSymbol>
<IgnoreAllDefaultLibraries>false</IgnoreAllDefaultLibraries>
<ImageHasSafeExceptionHandlers>false</ImageHasSafeExceptionHandlers>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<PreprocessorDefinitions>WIN32;NDEBUG;_WINDOWS;_USRDLL;SIGNPLUGIN_EXPORTS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<WarningLevel>Level3</WarningLevel>
<DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
<ExceptionHandling>false</ExceptionHandling>
<BufferSecurityCheck>false</BufferSecurityCheck>
<Optimization>MinSpace</Optimization>
<OmitFramePointers>true</OmitFramePointers>
<FunctionLevelLinking>true</FunctionLevelLinking>
</ClCompile>
<Link>
<TargetMachine>MachineX86</TargetMachine>
<GenerateDebugInformation>false</GenerateDebugInformation>
<SubSystem>Windows</SubSystem>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
<IgnoreAllDefaultLibraries>true</IgnoreAllDefaultLibraries>
<EntryPointSymbol>DllMain</EntryPointSymbol>
<ImageHasSafeExceptionHandlers>false</ImageHasSafeExceptionHandlers>
<LinkTimeCodeGeneration>UseLinkTimeCodeGeneration</LinkTimeCodeGeneration>
</Link>
</ItemDefinitionGroup>
<ItemGroup>
<ClCompile Include="main.cpp" />
<ClCompile Include="tiny\c25519.c" />
<ClCompile Include="tiny\ed25519.c" />
<ClCompile Include="tiny\edsign.c" />
<ClCompile Include="tiny\f25519.c" />
<ClCompile Include="tiny\fprime.c" />
<ClCompile Include="tiny\morph25519.c" />
<ClCompile Include="tiny\sha512.c" />
<ClCompile Include="win32_crt_math.cpp" />
<ClCompile Include="win32_crt_memory.cpp" />
</ItemGroup>
<ItemGroup>
<ClInclude Include="curve25519-donna-32bit.h" />
<ClInclude Include="curve25519-donna-64bit.h" />
<ClInclude Include="curve25519-donna-helpers.h" />
<ClInclude Include="curve25519-donna-sse2.h" />
<ClInclude Include="ed25519-donna-32bit-tables.h" />
<ClInclude Include="ed25519-donna-64bit-tables.h" />
<ClInclude Include="ed25519-donna-batchverify.h" />
<ClInclude Include="ed25519-donna-impl-base.h" />
<ClInclude Include="ed25519-donna-impl-sse2.h" />
<ClInclude Include="ed25519-donna-portable-identify.h" />
<ClInclude Include="ed25519-donna-portable.h" />
<ClInclude Include="ed25519-donna.h" />
<ClInclude Include="ed25519-hash-custom.h" />
<ClInclude Include="ed25519-hash.h" />
<ClInclude Include="ed25519-randombytes.h" />
<ClInclude Include="ed25519.h" />
<ClInclude Include="modm-donna-32bit.h" />
<ClInclude Include="modm-donna-64bit.h" />
<ClInclude Include="tiny\c25519.h" />
<ClInclude Include="tiny\ed25519.h" />
<ClInclude Include="tiny\edsign.h" />
<ClInclude Include="tiny\f25519.h" />
<ClInclude Include="tiny\fprime.h" />
<ClInclude Include="tiny\morph25519.h" />
<ClInclude Include="tiny\sha512.h" />
</ItemGroup>
<ItemGroup>
<Object Include="chkstk.obj">
<ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
</Object>
</ItemGroup>
<ItemGroup>
<Library Include="nsis\pluginapi-x86-unicode.lib" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View file

@ -0,0 +1,132 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup>
<Filter Include="Source Files">
<UniqueIdentifier>{4FC737F1-C7A5-4376-A066-2A32D752A2FF}</UniqueIdentifier>
<Extensions>cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx</Extensions>
</Filter>
<Filter Include="Header Files">
<UniqueIdentifier>{93995380-89BD-4b04-88EB-625FBE52EBFB}</UniqueIdentifier>
<Extensions>h;hh;hpp;hxx;hm;inl;inc;xsd</Extensions>
</Filter>
<Filter Include="Resource Files">
<UniqueIdentifier>{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}</UniqueIdentifier>
<Extensions>rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav</Extensions>
</Filter>
</ItemGroup>
<ItemGroup>
<ClCompile Include="main.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="win32_crt_math.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="win32_crt_memory.cpp">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\c25519.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\ed25519.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\edsign.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\f25519.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\fprime.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\morph25519.c">
<Filter>Source Files</Filter>
</ClCompile>
<ClCompile Include="tiny\sha512.c">
<Filter>Source Files</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<ClInclude Include="curve25519-donna-32bit.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="curve25519-donna-64bit.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="curve25519-donna-helpers.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="curve25519-donna-sse2.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-32bit-tables.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-64bit-tables.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-batchverify.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-impl-base.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-impl-sse2.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-portable-identify.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna-portable.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-donna.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-hash-custom.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-hash.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519-randombytes.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="ed25519.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="modm-donna-32bit.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="modm-donna-64bit.h">
<Filter>Header Files</Filter>
</ClInclude>
<ClInclude Include="tiny\c25519.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\ed25519.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\edsign.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\f25519.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\fprime.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\morph25519.h">
<Filter>Source Files</Filter>
</ClInclude>
<ClInclude Include="tiny\sha512.h">
<Filter>Source Files</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<Object Include="chkstk.obj" />
</ItemGroup>
<ItemGroup>
<Library Include="nsis\pluginapi-x86-unicode.lib" />
</ItemGroup>
</Project>

View file

@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="15.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup />
</Project>

View file

@ -0,0 +1,124 @@
/* Curve25519 (Montgomery form)
* Daniel Beer <dlbeer@gmail.com>, 18 Apr 2014
*
* This file is in the public domain.
*/
#include "c25519.h"
const uint8_t c25519_base_x[F25519_SIZE] = {9};
/* Double an X-coordinate */
static void xc_double(uint8_t *x3, uint8_t *z3,
const uint8_t *x1, const uint8_t *z1)
{
/* Explicit formulas database: dbl-1987-m
*
* source 1987 Montgomery "Speeding the Pollard and elliptic
* curve methods of factorization", page 261, fourth display
* compute X3 = (X1^2-Z1^2)^2
* compute Z3 = 4 X1 Z1 (X1^2 + a X1 Z1 + Z1^2)
*/
uint8_t x1sq[F25519_SIZE];
uint8_t z1sq[F25519_SIZE];
uint8_t x1z1[F25519_SIZE];
uint8_t a[F25519_SIZE];
f25519_mul__distinct(x1sq, x1, x1);
f25519_mul__distinct(z1sq, z1, z1);
f25519_mul__distinct(x1z1, x1, z1);
f25519_sub(a, x1sq, z1sq);
f25519_mul__distinct(x3, a, a);
f25519_mul_c(a, x1z1, 486662);
f25519_add(a, x1sq, a);
f25519_add(a, z1sq, a);
f25519_mul__distinct(x1sq, x1z1, a);
f25519_mul_c(z3, x1sq, 4);
}
/* Differential addition */
static void xc_diffadd(uint8_t *x5, uint8_t *z5,
const uint8_t *x1, const uint8_t *z1,
const uint8_t *x2, const uint8_t *z2,
const uint8_t *x3, const uint8_t *z3)
{
/* Explicit formulas database: dbl-1987-m3
*
* source 1987 Montgomery "Speeding the Pollard and elliptic curve
* methods of factorization", page 261, fifth display, plus
* common-subexpression elimination
* compute A = X2+Z2
* compute B = X2-Z2
* compute C = X3+Z3
* compute D = X3-Z3
* compute DA = D A
* compute CB = C B
* compute X5 = Z1(DA+CB)^2
* compute Z5 = X1(DA-CB)^2
*/
uint8_t da[F25519_SIZE];
uint8_t cb[F25519_SIZE];
uint8_t a[F25519_SIZE];
uint8_t b[F25519_SIZE];
f25519_add(a, x2, z2);
f25519_sub(b, x3, z3); /* D */
f25519_mul__distinct(da, a, b);
f25519_sub(b, x2, z2);
f25519_add(a, x3, z3); /* C */
f25519_mul__distinct(cb, a, b);
f25519_add(a, da, cb);
f25519_mul__distinct(b, a, a);
f25519_mul__distinct(x5, z1, b);
f25519_sub(a, da, cb);
f25519_mul__distinct(b, a, a);
f25519_mul__distinct(z5, x1, b);
}
void c25519_smult(uint8_t *result, const uint8_t *q, const uint8_t *e)
{
/* Current point: P_m */
uint8_t xm[F25519_SIZE];
uint8_t zm[F25519_SIZE] = {1};
/* Predecessor: P_(m-1) */
uint8_t xm1[F25519_SIZE] = {1};
uint8_t zm1[F25519_SIZE] = {0};
int i;
/* Note: bit 254 is assumed to be 1 */
f25519_copy(xm, q);
for (i = 253; i >= 0; i--) {
const int bit = (e[i >> 3] >> (i & 7)) & 1;
uint8_t xms[F25519_SIZE];
uint8_t zms[F25519_SIZE];
/* From P_m and P_(m-1), compute P_(2m) and P_(2m-1) */
xc_diffadd(xm1, zm1, q, f25519_one, xm, zm, xm1, zm1);
xc_double(xm, zm, xm, zm);
/* Compute P_(2m+1) */
xc_diffadd(xms, zms, xm1, zm1, xm, zm, q, f25519_one);
/* Select:
* bit = 1 --> (P_(2m+1), P_(2m))
* bit = 0 --> (P_(2m), P_(2m-1))
*/
f25519_select(xm1, xm1, xm, bit);
f25519_select(zm1, zm1, zm, bit);
f25519_select(xm, xm, xms, bit);
f25519_select(zm, zm, zms, bit);
}
/* Freeze out of projective coordinates */
f25519_inv__distinct(zm1, zm);
f25519_mul__distinct(result, zm1, xm);
f25519_normalize(result);
}

View file

@ -0,0 +1,48 @@
/* Curve25519 (Montgomery form)
* Daniel Beer <dlbeer@gmail.com>, 18 Apr 2014
*
* This file is in the public domain.
*/
#ifndef C25519_H_
#define C25519_H_
#include <stdint.h>
#include "f25519.h"
/* Curve25519 has the equation over F(p = 2^255-19):
*
* y^2 = x^3 + 486662x^2 + x
*
* 486662 = 4A+2, where A = 121665. This is a Montgomery curve.
*
* For more information, see:
*
* Bernstein, D.J. (2006) "Curve25519: New Diffie-Hellman speed
* records". Document ID: 4230efdfa673480fc079449d90f322c0.
*/
/* This is the site of a Curve25519 exponent (private key) */
#define C25519_EXPONENT_SIZE 32
/* Having generated 32 random bytes, you should call this function to
* finalize the generated key.
*/
static inline void c25519_prepare(uint8_t *key)
{
key[0] &= 0xf8;
key[31] &= 0x7f;
key[31] |= 0x40;
}
/* X-coordinate of the base point */
extern const uint8_t c25519_base_x[F25519_SIZE];
/* X-coordinate scalar multiply: given the X-coordinate of q, return the
* X-coordinate of e*q.
*
* result and q are field elements. e is an exponent.
*/
void c25519_smult(uint8_t *result, const uint8_t *q, const uint8_t *e);
#endif

View file

@ -0,0 +1,320 @@
/* Edwards curve operations
* Daniel Beer <dlbeer@gmail.com>, 9 Jan 2014
*
* This file is in the public domain.
*/
#include "ed25519.h"
/* Base point is (numbers wrapped):
*
* x = 151122213495354007725011514095885315114
* 54012693041857206046113283949847762202
* y = 463168356949264781694283940034751631413
* 07993866256225615783033603165251855960
*
* y is derived by transforming the original Montgomery base (u=9). x
* is the corresponding positive coordinate for the new curve equation.
* t is x*y.
*/
const struct ed25519_pt ed25519_base = {
.x = {
0x1a, 0xd5, 0x25, 0x8f, 0x60, 0x2d, 0x56, 0xc9,
0xb2, 0xa7, 0x25, 0x95, 0x60, 0xc7, 0x2c, 0x69,
0x5c, 0xdc, 0xd6, 0xfd, 0x31, 0xe2, 0xa4, 0xc0,
0xfe, 0x53, 0x6e, 0xcd, 0xd3, 0x36, 0x69, 0x21
},
.y = {
0x58, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66,
0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66,
0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66,
0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66, 0x66
},
.t = {
0xa3, 0xdd, 0xb7, 0xa5, 0xb3, 0x8a, 0xde, 0x6d,
0xf5, 0x52, 0x51, 0x77, 0x80, 0x9f, 0xf0, 0x20,
0x7d, 0xe3, 0xab, 0x64, 0x8e, 0x4e, 0xea, 0x66,
0x65, 0x76, 0x8b, 0xd7, 0x0f, 0x5f, 0x87, 0x67
},
.z = {1, 0}
};
const struct ed25519_pt ed25519_neutral = {
.x = {0},
.y = {1, 0},
.t = {0},
.z = {1, 0}
};
/* Conversion to and from projective coordinates */
void ed25519_project(struct ed25519_pt *p,
const uint8_t *x, const uint8_t *y)
{
f25519_copy(p->x, x);
f25519_copy(p->y, y);
f25519_load(p->z, 1);
f25519_mul__distinct(p->t, x, y);
}
void ed25519_unproject(uint8_t *x, uint8_t *y,
const struct ed25519_pt *p)
{
uint8_t z1[F25519_SIZE];
f25519_inv__distinct(z1, p->z);
f25519_mul__distinct(x, p->x, z1);
f25519_mul__distinct(y, p->y, z1);
f25519_normalize(x);
f25519_normalize(y);
}
/* Compress/uncompress points. We compress points by storing the x
* coordinate and the parity of the y coordinate.
*
* Rearranging the curve equation, we obtain explicit formulae for the
* coordinates:
*
* x = sqrt((y^2-1) / (1+dy^2))
* y = sqrt((x^2+1) / (1-dx^2))
*
* Where d = (-121665/121666), or:
*
* d = 370957059346694393431380835087545651895
* 42113879843219016388785533085940283555
*/
static const uint8_t ed25519_d[F25519_SIZE] = {
0xa3, 0x78, 0x59, 0x13, 0xca, 0x4d, 0xeb, 0x75,
0xab, 0xd8, 0x41, 0x41, 0x4d, 0x0a, 0x70, 0x00,
0x98, 0xe8, 0x79, 0x77, 0x79, 0x40, 0xc7, 0x8c,
0x73, 0xfe, 0x6f, 0x2b, 0xee, 0x6c, 0x03, 0x52
};
void ed25519_pack(uint8_t *c, const uint8_t *x, const uint8_t *y)
{
uint8_t tmp[F25519_SIZE];
uint8_t parity;
f25519_copy(tmp, x);
f25519_normalize(tmp);
parity = (tmp[0] & 1) << 7;
f25519_copy(c, y);
f25519_normalize(c);
c[31] |= parity;
}
uint8_t ed25519_try_unpack(uint8_t *x, uint8_t *y, const uint8_t *comp)
{
const int parity = comp[31] >> 7;
uint8_t a[F25519_SIZE];
uint8_t b[F25519_SIZE];
uint8_t c[F25519_SIZE];
/* Unpack y */
f25519_copy(y, comp);
y[31] &= 127;
/* Compute c = y^2 */
f25519_mul__distinct(c, y, y);
/* Compute b = (1+dy^2)^-1 */
f25519_mul__distinct(b, c, ed25519_d);
f25519_add(a, b, f25519_one);
f25519_inv__distinct(b, a);
/* Compute a = y^2-1 */
f25519_sub(a, c, f25519_one);
/* Compute c = a*b = (y^2-1)/(1-dy^2) */
f25519_mul__distinct(c, a, b);
/* Compute a, b = +/-sqrt(c), if c is square */
f25519_sqrt(a, c);
f25519_neg(b, a);
/* Select one of them, based on the compressed parity bit */
f25519_select(x, a, b, (a[0] ^ parity) & 1);
/* Verify that x^2 = c */
f25519_mul__distinct(a, x, x);
f25519_normalize(a);
f25519_normalize(c);
return f25519_eq(a, c);
}
/* k = 2d */
static const uint8_t ed25519_k[F25519_SIZE] = {
0x59, 0xf1, 0xb2, 0x26, 0x94, 0x9b, 0xd6, 0xeb,
0x56, 0xb1, 0x83, 0x82, 0x9a, 0x14, 0xe0, 0x00,
0x30, 0xd1, 0xf3, 0xee, 0xf2, 0x80, 0x8e, 0x19,
0xe7, 0xfc, 0xdf, 0x56, 0xdc, 0xd9, 0x06, 0x24
};
void ed25519_add(struct ed25519_pt *r,
const struct ed25519_pt *p1, const struct ed25519_pt *p2)
{
/* Explicit formulas database: add-2008-hwcd-3
*
* source 2008 Hisil--Wong--Carter--Dawson,
* http://eprint.iacr.org/2008/522, Section 3.1
* appliesto extended-1
* parameter k
* assume k = 2 d
* compute A = (Y1-X1)(Y2-X2)
* compute B = (Y1+X1)(Y2+X2)
* compute C = T1 k T2
* compute D = Z1 2 Z2
* compute E = B - A
* compute F = D - C
* compute G = D + C
* compute H = B + A
* compute X3 = E F
* compute Y3 = G H
* compute T3 = E H
* compute Z3 = F G
*/
uint8_t a[F25519_SIZE];
uint8_t b[F25519_SIZE];
uint8_t c[F25519_SIZE];
uint8_t d[F25519_SIZE];
uint8_t e[F25519_SIZE];
uint8_t f[F25519_SIZE];
uint8_t g[F25519_SIZE];
uint8_t h[F25519_SIZE];
/* A = (Y1-X1)(Y2-X2) */
f25519_sub(c, p1->y, p1->x);
f25519_sub(d, p2->y, p2->x);
f25519_mul__distinct(a, c, d);
/* B = (Y1+X1)(Y2+X2) */
f25519_add(c, p1->y, p1->x);
f25519_add(d, p2->y, p2->x);
f25519_mul__distinct(b, c, d);
/* C = T1 k T2 */
f25519_mul__distinct(d, p1->t, p2->t);
f25519_mul__distinct(c, d, ed25519_k);
/* D = Z1 2 Z2 */
f25519_mul__distinct(d, p1->z, p2->z);
f25519_add(d, d, d);
/* E = B - A */
f25519_sub(e, b, a);
/* F = D - C */
f25519_sub(f, d, c);
/* G = D + C */
f25519_add(g, d, c);
/* H = B + A */
f25519_add(h, b, a);
/* X3 = E F */
f25519_mul__distinct(r->x, e, f);
/* Y3 = G H */
f25519_mul__distinct(r->y, g, h);
/* T3 = E H */
f25519_mul__distinct(r->t, e, h);
/* Z3 = F G */
f25519_mul__distinct(r->z, f, g);
}
void ed25519_double(struct ed25519_pt *r, const struct ed25519_pt *p)
{
/* Explicit formulas database: dbl-2008-hwcd
*
* source 2008 Hisil--Wong--Carter--Dawson,
* http://eprint.iacr.org/2008/522, Section 3.3
* compute A = X1^2
* compute B = Y1^2
* compute C = 2 Z1^2
* compute D = a A
* compute E = (X1+Y1)^2-A-B
* compute G = D + B
* compute F = G - C
* compute H = D - B
* compute X3 = E F
* compute Y3 = G H
* compute T3 = E H
* compute Z3 = F G
*/
uint8_t a[F25519_SIZE];
uint8_t b[F25519_SIZE];
uint8_t c[F25519_SIZE];
uint8_t e[F25519_SIZE];
uint8_t f[F25519_SIZE];
uint8_t g[F25519_SIZE];
uint8_t h[F25519_SIZE];
/* A = X1^2 */
f25519_mul__distinct(a, p->x, p->x);
/* B = Y1^2 */
f25519_mul__distinct(b, p->y, p->y);
/* C = 2 Z1^2 */
f25519_mul__distinct(c, p->z, p->z);
f25519_add(c, c, c);
/* D = a A (alter sign) */
/* E = (X1+Y1)^2-A-B */
f25519_add(f, p->x, p->y);
f25519_mul__distinct(e, f, f);
f25519_sub(e, e, a);
f25519_sub(e, e, b);
/* G = D + B */
f25519_sub(g, b, a);
/* F = G - C */
f25519_sub(f, g, c);
/* H = D - B */
f25519_neg(h, b);
f25519_sub(h, h, a);
/* X3 = E F */
f25519_mul__distinct(r->x, e, f);
/* Y3 = G H */
f25519_mul__distinct(r->y, g, h);
/* T3 = E H */
f25519_mul__distinct(r->t, e, h);
/* Z3 = F G */
f25519_mul__distinct(r->z, f, g);
}
void ed25519_smult(struct ed25519_pt *r_out, const struct ed25519_pt *p,
const uint8_t *e)
{
struct ed25519_pt r;
int i;
ed25519_copy(&r, &ed25519_neutral);
for (i = 255; i >= 0; i--) {
const uint8_t bit = (e[i >> 3] >> (i & 7)) & 1;
struct ed25519_pt s;
ed25519_double(&r, &r);
ed25519_add(&s, &r, p);
f25519_select(r.x, r.x, s.x, bit);
f25519_select(r.y, r.y, s.y, bit);
f25519_select(r.z, r.z, s.z, bit);
f25519_select(r.t, r.t, s.t, bit);
}
ed25519_copy(r_out, &r);
}

View file

@ -0,0 +1,82 @@
/* Edwards curve operations
* Daniel Beer <dlbeer@gmail.com>, 9 Jan 2014
*
* This file is in the public domain.
*/
#ifndef ED25519_H_
#define ED25519_H_
#include "f25519.h"
/* This is not the Ed25519 signature system. Rather, we're implementing
* basic operations on the twisted Edwards curve over (Z mod 2^255-19):
*
* -x^2 + y^2 = 1 - (121665/121666)x^2y^2
*
* With the positive-x base point y = 4/5.
*
* These functions will not leak secret data through timing.
*
* For more information, see:
*
* Bernstein, D.J. & Lange, T. (2007) "Faster addition and doubling on
* elliptic curves". Document ID: 95616567a6ba20f575c5f25e7cebaf83.
*
* Hisil, H. & Wong, K K. & Carter, G. & Dawson, E. (2008) "Twisted
* Edwards curves revisited". Advances in Cryptology, ASIACRYPT 2008,
* Vol. 5350, pp. 326-343.
*/
/* Projective coordinates */
struct ed25519_pt {
uint8_t x[F25519_SIZE];
uint8_t y[F25519_SIZE];
uint8_t t[F25519_SIZE];
uint8_t z[F25519_SIZE];
};
extern const struct ed25519_pt ed25519_base;
extern const struct ed25519_pt ed25519_neutral;
/* Convert between projective and affine coordinates (x/y in F25519) */
void ed25519_project(struct ed25519_pt *p,
const uint8_t *x, const uint8_t *y);
void ed25519_unproject(uint8_t *x, uint8_t *y,
const struct ed25519_pt *p);
/* Compress/uncompress points. try_unpack() will check that the
* compressed point is on the curve, returning 1 if the unpacked point
* is valid, and 0 otherwise.
*/
#define ED25519_PACK_SIZE F25519_SIZE
void ed25519_pack(uint8_t *c, const uint8_t *x, const uint8_t *y);
uint8_t ed25519_try_unpack(uint8_t *x, uint8_t *y, const uint8_t *c);
/* Add, double and scalar multiply */
#define ED25519_EXPONENT_SIZE 32
/* Prepare an exponent by clamping appropriate bits */
static inline void ed25519_prepare(uint8_t *e)
{
e[0] &= 0xf8;
e[31] &= 0x7f;
e[31] |= 0x40;
}
/* Order of the group generated by the base point */
static inline void ed25519_copy(struct ed25519_pt *dst,
const struct ed25519_pt *src)
{
memcpy(dst, src, sizeof(*dst));
}
void ed25519_add(struct ed25519_pt *r,
const struct ed25519_pt *a, const struct ed25519_pt *b);
void ed25519_double(struct ed25519_pt *r, const struct ed25519_pt *a);
void ed25519_smult(struct ed25519_pt *r, const struct ed25519_pt *a,
const uint8_t *e);
#endif

View file

@ -0,0 +1,168 @@
/* Edwards curve signature system
* Daniel Beer <dlbeer@gmail.com>, 22 Apr 2014
*
* This file is in the public domain.
*/
#include "ed25519.h"
#include "sha512.h"
#include "fprime.h"
#include "edsign.h"
#define EXPANDED_SIZE 64
static const uint8_t ed25519_order[FPRIME_SIZE] = {
0xed, 0xd3, 0xf5, 0x5c, 0x1a, 0x63, 0x12, 0x58,
0xd6, 0x9c, 0xf7, 0xa2, 0xde, 0xf9, 0xde, 0x14,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10
};
static void expand_key(uint8_t *expanded, const uint8_t *secret)
{
struct sha512_state s;
sha512_init(&s);
sha512_final(&s, secret, EDSIGN_SECRET_KEY_SIZE);
sha512_get(&s, expanded, 0, EXPANDED_SIZE);
ed25519_prepare(expanded);
}
static uint8_t upp(struct ed25519_pt *p, const uint8_t *packed)
{
uint8_t x[F25519_SIZE];
uint8_t y[F25519_SIZE];
uint8_t ok = ed25519_try_unpack(x, y, packed);
ed25519_project(p, x, y);
return ok;
}
static void pp(uint8_t *packed, const struct ed25519_pt *p)
{
uint8_t x[F25519_SIZE];
uint8_t y[F25519_SIZE];
ed25519_unproject(x, y, p);
ed25519_pack(packed, x, y);
}
static void sm_pack(uint8_t *r, const uint8_t *k)
{
struct ed25519_pt p;
ed25519_smult(&p, &ed25519_base, k);
pp(r, &p);
}
void edsign_sec_to_pub(uint8_t *pub, const uint8_t *secret)
{
uint8_t expanded[EXPANDED_SIZE];
expand_key(expanded, secret);
sm_pack(pub, expanded);
}
static void hash_with_prefix(uint8_t *out_fp,
uint8_t *init_block, unsigned int prefix_size,
const uint8_t *message, size_t len)
{
struct sha512_state s;
sha512_init(&s);
if (len < SHA512_BLOCK_SIZE && len + prefix_size < SHA512_BLOCK_SIZE) {
memcpy(init_block + prefix_size, message, len);
sha512_final(&s, init_block, len + prefix_size);
} else {
size_t i;
memcpy(init_block + prefix_size, message,
SHA512_BLOCK_SIZE - prefix_size);
sha512_block(&s, init_block);
for (i = SHA512_BLOCK_SIZE - prefix_size;
i + SHA512_BLOCK_SIZE <= len;
i += SHA512_BLOCK_SIZE)
sha512_block(&s, message + i);
sha512_final(&s, message + i, len + prefix_size);
}
sha512_get(&s, init_block, 0, SHA512_HASH_SIZE);
fprime_from_bytes(out_fp, init_block, SHA512_HASH_SIZE, ed25519_order);
}
static void generate_k(uint8_t *k, const uint8_t *kgen_key,
const uint8_t *message, size_t len)
{
uint8_t block[SHA512_BLOCK_SIZE];
memcpy(block, kgen_key, 32);
hash_with_prefix(k, block, 32, message, len);
}
static void hash_message(uint8_t *z, const uint8_t *r, const uint8_t *a,
const uint8_t *m, size_t len)
{
uint8_t block[SHA512_BLOCK_SIZE];
memcpy(block, r, 32);
memcpy(block + 32, a, 32);
hash_with_prefix(z, block, 64, m, len);
}
void edsign_sign(uint8_t *signature, const uint8_t *pub,
const uint8_t *secret,
const uint8_t *message, size_t len)
{
uint8_t expanded[EXPANDED_SIZE];
uint8_t e[FPRIME_SIZE];
uint8_t s[FPRIME_SIZE];
uint8_t k[FPRIME_SIZE];
uint8_t z[FPRIME_SIZE];
expand_key(expanded, secret);
/* Generate k and R = kB */
generate_k(k, expanded + 32, message, len);
sm_pack(signature, k);
/* Compute z = H(R, A, M) */
hash_message(z, signature, pub, message, len);
/* Obtain e */
fprime_from_bytes(e, expanded, 32, ed25519_order);
/* Compute s = ze + k */
fprime_mul(s, z, e, ed25519_order);
fprime_add(s, k, ed25519_order);
memcpy(signature + 32, s, 32);
}
uint8_t edsign_verify(const uint8_t *signature, const uint8_t *pub,
const uint8_t *message, size_t len)
{
struct ed25519_pt p;
struct ed25519_pt q;
uint8_t lhs[F25519_SIZE];
uint8_t rhs[F25519_SIZE];
uint8_t z[FPRIME_SIZE];
uint8_t ok = 1;
/* Compute z = H(R, A, M) */
hash_message(z, signature, pub, message, len);
/* sB = (ze + k)B = ... */
sm_pack(lhs, signature + 32);
/* ... = zA + R */
ok &= upp(&p, pub);
ed25519_smult(&p, &p, z);
ok &= upp(&q, signature);
ed25519_add(&p, &p, &q);
pp(rhs, &p);
/* Equal? */
return ok & f25519_eq(lhs, rhs);
}

View file

@ -0,0 +1,51 @@
/* Edwards curve signature system
* Daniel Beer <dlbeer@gmail.com>, 22 Apr 2014
*
* This file is in the public domain.
*/
#ifndef EDSIGN_H_
#define EDSIGN_H_
#include <stdint.h>
#include <stddef.h>
/* This is the Ed25519 signature system, as described in:
*
* Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, Bo-Yin
* Yang. High-speed high-security signatures. Journal of Cryptographic
* Engineering 2 (2012), 77-89. Document ID:
* a1a62a2f76d23f65d622484ddd09caf8. URL:
* http://cr.yp.to/papers.html#ed25519. Date: 2011.09.26.
*
* The format and calculation of signatures is compatible with the
* Ed25519 implementation in SUPERCOP. Note, however, that our secret
* keys are half the size: we don't store a copy of the public key in
* the secret key (we generate it on demand).
*/
/* Any string of 32 random bytes is a valid secret key. There is no
* clamping of bits, because we don't use the key directly as an
* exponent (the exponent is derived from part of a key expansion).
*/
#define EDSIGN_SECRET_KEY_SIZE 32
/* Given a secret key, produce the public key (a packed Edwards-curve
* point).
*/
#define EDSIGN_PUBLIC_KEY_SIZE 32
void edsign_sec_to_pub(uint8_t *pub, const uint8_t *secret);
/* Produce a signature for a message. */
#define EDSIGN_SIGNATURE_SIZE 64
void edsign_sign(uint8_t *signature, const uint8_t *pub,
const uint8_t *secret,
const uint8_t *message, size_t len);
/* Verify a message signature. Returns non-zero if ok. */
uint8_t edsign_verify(const uint8_t *signature, const uint8_t *pub,
const uint8_t *message, size_t len);
#endif

View file

@ -0,0 +1,324 @@
/* Arithmetic mod p = 2^255-19
* Daniel Beer <dlbeer@gmail.com>, 5 Jan 2014
*
* This file is in the public domain.
*/
#include "f25519.h"
const uint8_t f25519_zero[F25519_SIZE] = {0};
const uint8_t f25519_one[F25519_SIZE] = {1};
void f25519_load(uint8_t *x, uint32_t c)
{
unsigned int i;
for (i = 0; i < sizeof(c); i++) {
x[i] = c;
c >>= 8;
}
for (; i < F25519_SIZE; i++)
x[i] = 0;
}
void f25519_normalize(uint8_t *x)
{
uint8_t minusp[F25519_SIZE];
uint16_t c;
int i;
/* Reduce using 2^255 = 19 mod p */
c = (x[31] >> 7) * 19;
x[31] &= 127;
for (i = 0; i < F25519_SIZE; i++) {
c += x[i];
x[i] = c;
c >>= 8;
}
/* The number is now less than 2^255 + 18, and therefore less than
* 2p. Try subtracting p, and conditionally load the subtracted
* value if underflow did not occur.
*/
c = 19;
for (i = 0; i + 1 < F25519_SIZE; i++) {
c += x[i];
minusp[i] = c;
c >>= 8;
}
c += ((uint16_t)x[i]) - 128;
minusp[31] = c;
/* Load x-p if no underflow */
f25519_select(x, minusp, x, (c >> 15) & 1);
}
uint8_t f25519_eq(const uint8_t *x, const uint8_t *y)
{
uint8_t sum = 0;
int i;
for (i = 0; i < F25519_SIZE; i++)
sum |= x[i] ^ y[i];
sum |= (sum >> 4);
sum |= (sum >> 2);
sum |= (sum >> 1);
return (sum ^ 1) & 1;
}
void f25519_select(uint8_t *dst,
const uint8_t *zero, const uint8_t *one,
uint8_t condition)
{
const uint8_t mask = -condition;
int i;
for (i = 0; i < F25519_SIZE; i++)
dst[i] = zero[i] ^ (mask & (one[i] ^ zero[i]));
}
void f25519_add(uint8_t *r, const uint8_t *a, const uint8_t *b)
{
uint16_t c = 0;
int i;
/* Add */
for (i = 0; i < F25519_SIZE; i++) {
c >>= 8;
c += ((uint16_t)a[i]) + ((uint16_t)b[i]);
r[i] = c;
}
/* Reduce with 2^255 = 19 mod p */
r[31] &= 127;
c = (c >> 7) * 19;
for (i = 0; i < F25519_SIZE; i++) {
c += r[i];
r[i] = c;
c >>= 8;
}
}
void f25519_sub(uint8_t *r, const uint8_t *a, const uint8_t *b)
{
uint32_t c = 0;
int i;
/* Calculate a + 2p - b, to avoid underflow */
c = 218;
for (i = 0; i + 1 < F25519_SIZE; i++) {
c += 65280 + ((uint32_t)a[i]) - ((uint32_t)b[i]);
r[i] = c;
c >>= 8;
}
c += ((uint32_t)a[31]) - ((uint32_t)b[31]);
r[31] = c & 127;
c = (c >> 7) * 19;
for (i = 0; i < F25519_SIZE; i++) {
c += r[i];
r[i] = c;
c >>= 8;
}
}
void f25519_neg(uint8_t *r, const uint8_t *a)
{
uint32_t c = 0;
int i;
/* Calculate 2p - a, to avoid underflow */
c = 218;
for (i = 0; i + 1 < F25519_SIZE; i++) {
c += 65280 - ((uint32_t)a[i]);
r[i] = c;
c >>= 8;
}
c -= ((uint32_t)a[31]);
r[31] = c & 127;
c = (c >> 7) * 19;
for (i = 0; i < F25519_SIZE; i++) {
c += r[i];
r[i] = c;
c >>= 8;
}
}
void f25519_mul__distinct(uint8_t *r, const uint8_t *a, const uint8_t *b)
{
uint32_t c = 0;
int i;
for (i = 0; i < F25519_SIZE; i++) {
int j;
c >>= 8;
for (j = 0; j <= i; j++)
c += ((uint32_t)a[j]) * ((uint32_t)b[i - j]);
for (; j < F25519_SIZE; j++)
c += ((uint32_t)a[j]) *
((uint32_t)b[i + F25519_SIZE - j]) * 38;
r[i] = c;
}
r[31] &= 127;
c = (c >> 7) * 19;
for (i = 0; i < F25519_SIZE; i++) {
c += r[i];
r[i] = c;
c >>= 8;
}
}
void f25519_mul(uint8_t *r, const uint8_t *a, const uint8_t *b)
{
uint8_t tmp[F25519_SIZE];
f25519_mul__distinct(tmp, a, b);
f25519_copy(r, tmp);
}
void f25519_mul_c(uint8_t *r, const uint8_t *a, uint32_t b)
{
uint32_t c = 0;
int i;
for (i = 0; i < F25519_SIZE; i++) {
c >>= 8;
c += b * ((uint32_t)a[i]);
r[i] = c;
}
r[31] &= 127;
c >>= 7;
c *= 19;
for (i = 0; i < F25519_SIZE; i++) {
c += r[i];
r[i] = c;
c >>= 8;
}
}
void f25519_inv__distinct(uint8_t *r, const uint8_t *x)
{
uint8_t s[F25519_SIZE];
int i;
/* This is a prime field, so by Fermat's little theorem:
*
* x^(p-1) = 1 mod p
*
* Therefore, raise to (p-2) = 2^255-21 to get a multiplicative
* inverse.
*
* This is a 255-bit binary number with the digits:
*
* 11111111... 01011
*
* We compute the result by the usual binary chain, but
* alternate between keeping the accumulator in r and s, so as
* to avoid copying temporaries.
*/
/* 1 1 */
f25519_mul__distinct(s, x, x);
f25519_mul__distinct(r, s, x);
/* 1 x 248 */
for (i = 0; i < 248; i++) {
f25519_mul__distinct(s, r, r);
f25519_mul__distinct(r, s, x);
}
/* 0 */
f25519_mul__distinct(s, r, r);
/* 1 */
f25519_mul__distinct(r, s, s);
f25519_mul__distinct(s, r, x);
/* 0 */
f25519_mul__distinct(r, s, s);
/* 1 */
f25519_mul__distinct(s, r, r);
f25519_mul__distinct(r, s, x);
/* 1 */
f25519_mul__distinct(s, r, r);
f25519_mul__distinct(r, s, x);
}
void f25519_inv(uint8_t *r, const uint8_t *x)
{
uint8_t tmp[F25519_SIZE];
f25519_inv__distinct(tmp, x);
f25519_copy(r, tmp);
}
/* Raise x to the power of (p-5)/8 = 2^252-3, using s for temporary
* storage.
*/
static void exp2523(uint8_t *r, const uint8_t *x, uint8_t *s)
{
int i;
/* This number is a 252-bit number with the binary expansion:
*
* 111111... 01
*/
/* 1 1 */
f25519_mul__distinct(r, x, x);
f25519_mul__distinct(s, r, x);
/* 1 x 248 */
for (i = 0; i < 248; i++) {
f25519_mul__distinct(r, s, s);
f25519_mul__distinct(s, r, x);
}
/* 0 */
f25519_mul__distinct(r, s, s);
/* 1 */
f25519_mul__distinct(s, r, r);
f25519_mul__distinct(r, s, x);
}
void f25519_sqrt(uint8_t *r, const uint8_t *a)
{
uint8_t v[F25519_SIZE];
uint8_t i[F25519_SIZE];
uint8_t x[F25519_SIZE];
uint8_t y[F25519_SIZE];
/* v = (2a)^((p-5)/8) [x = 2a] */
f25519_mul_c(x, a, 2);
exp2523(v, x, y);
/* i = 2av^2 - 1 */
f25519_mul__distinct(y, v, v);
f25519_mul__distinct(i, x, y);
f25519_load(y, 1);
f25519_sub(i, i, y);
/* r = avi */
f25519_mul__distinct(x, v, a);
f25519_mul__distinct(r, x, i);
}

View file

@ -0,0 +1,92 @@
/* Arithmetic mod p = 2^255-19
* Daniel Beer <dlbeer@gmail.com>, 8 Jan 2014
*
* This file is in the public domain.
*/
#ifndef F25519_H_
#define F25519_H_
#include <stdint.h>
#include <string.h>
/* Field elements are represented as little-endian byte strings. All
* operations have timings which are independent of input data, so they
* can be safely used for cryptography.
*
* Computation is performed on un-normalized elements. These are byte
* strings which fall into the range 0 <= x < 2p. Use f25519_normalize()
* to convert to a value 0 <= x < p.
*
* Elements received from the outside may greater even than 2p.
* f25519_normalize() will correctly deal with these numbers too.
*/
#define F25519_SIZE 32
/* Identity constants */
extern const uint8_t f25519_zero[F25519_SIZE];
extern const uint8_t f25519_one[F25519_SIZE];
/* Load a small constant */
void f25519_load(uint8_t *x, uint32_t c);
/* Copy two points */
static inline void f25519_copy(uint8_t *x, const uint8_t *a)
{
memcpy(x, a, F25519_SIZE);
}
/* Normalize a field point x < 2*p by subtracting p if necessary */
void f25519_normalize(uint8_t *x);
/* Compare two field points in constant time. Return one if equal, zero
* otherwise. This should be performed only on normalized values.
*/
uint8_t f25519_eq(const uint8_t *x, const uint8_t *y);
/* Conditional copy. If condition == 0, then zero is copied to dst. If
* condition == 1, then one is copied to dst. Any other value results in
* undefined behaviour.
*/
void f25519_select(uint8_t *dst,
const uint8_t *zero, const uint8_t *one,
uint8_t condition);
/* Add/subtract two field points. The three pointers are not required to
* be distinct.
*/
void f25519_add(uint8_t *r, const uint8_t *a, const uint8_t *b);
void f25519_sub(uint8_t *r, const uint8_t *a, const uint8_t *b);
/* Unary negation */
void f25519_neg(uint8_t *r, const uint8_t *a);
/* Multiply two field points. The __distinct variant is used when r is
* known to be in a different location to a and b.
*/
void f25519_mul(uint8_t *r, const uint8_t *a, const uint8_t *b);
void f25519_mul__distinct(uint8_t *r, const uint8_t *a, const uint8_t *b);
/* Multiply a point by a small constant. The two pointers are not
* required to be distinct.
*
* The constant must be less than 2^24.
*/
void f25519_mul_c(uint8_t *r, const uint8_t *a, uint32_t b);
/* Take the reciprocal of a field point. The __distinct variant is used
* when r is known to be in a different location to x.
*/
void f25519_inv(uint8_t *r, const uint8_t *x);
void f25519_inv__distinct(uint8_t *r, const uint8_t *x);
/* Compute one of the square roots of the field element, if the element
* is square. The other square is -r.
*
* If the input is not square, the returned value is a valid field
* element, but not the correct answer. If you don't already know that
* your element is square, you should square the return value and test.
*/
void f25519_sqrt(uint8_t *r, const uint8_t *x);
#endif

View file

@ -0,0 +1,215 @@
/* Arithmetic in prime fields
* Daniel Beer <dlbeer@gmail.com>, 10 Jan 2014
*
* This file is in the public domain.
*/
#include "fprime.h"
const uint8_t fprime_zero[FPRIME_SIZE] = {0};
const uint8_t fprime_one[FPRIME_SIZE] = {1};
static void raw_add(uint8_t *x, const uint8_t *p)
{
uint16_t c = 0;
int i;
for (i = 0; i < FPRIME_SIZE; i++) {
c += ((uint16_t)x[i]) + ((uint16_t)p[i]);
x[i] = c;
c >>= 8;
}
}
static void raw_try_sub(uint8_t *x, const uint8_t *p)
{
uint8_t minusp[FPRIME_SIZE];
uint16_t c = 0;
int i;
for (i = 0; i < FPRIME_SIZE; i++) {
c = ((uint16_t)x[i]) - ((uint16_t)p[i]) - c;
minusp[i] = c;
c = (c >> 8) & 1;
}
fprime_select(x, minusp, x, c);
}
/* Warning: this function is variable-time */
static int prime_msb(const uint8_t *p)
{
int i;
uint8_t x;
for (i = FPRIME_SIZE - 1; i >= 0; i--)
if (p[i])
break;
x = p[i];
i <<= 3;
while (x) {
x >>= 1;
i++;
}
return i - 1;
}
/* Warning: this function may be variable-time in the argument n */
static void shift_n_bits(uint8_t *x, int n)
{
uint16_t c = 0;
int i;
for (i = 0; i < FPRIME_SIZE; i++) {
c |= ((uint16_t)x[i]) << n;
x[i] = c;
c >>= 8;
}
}
void fprime_load(uint8_t *x, uint32_t c)
{
unsigned int i;
for (i = 0; i < sizeof(c); i++) {
x[i] = c;
c >>= 8;
}
for (; i < FPRIME_SIZE; i++)
x[i] = 0;
}
static inline int min_int(int a, int b)
{
return a < b ? a : b;
}
void fprime_from_bytes(uint8_t *n,
const uint8_t *x, size_t len,
const uint8_t *modulus)
{
const int preload_total = min_int(prime_msb(modulus) - 1, len << 3);
const int preload_bytes = preload_total >> 3;
const int preload_bits = preload_total & 7;
const int rbits = (len << 3) - preload_total;
int i;
memset(n, 0, FPRIME_SIZE);
for (i = 0; i < preload_bytes; i++)
n[i] = x[len - preload_bytes + i];
if (preload_bits) {
shift_n_bits(n, preload_bits);
n[0] |= x[len - preload_bytes - 1] >> (8 - preload_bits);
}
for (i = rbits - 1; i >= 0; i--) {
const uint8_t bit = (x[i >> 3] >> (i & 7)) & 1;
shift_n_bits(n, 1);
n[0] |= bit;
raw_try_sub(n, modulus);
}
}
void fprime_normalize(uint8_t *x, const uint8_t *modulus)
{
uint8_t n[FPRIME_SIZE];
fprime_from_bytes(n, x, FPRIME_SIZE, modulus);
fprime_copy(x, n);
}
uint8_t fprime_eq(const uint8_t *x, const uint8_t *y)
{
uint8_t sum = 0;
int i;
for (i = 0; i < FPRIME_SIZE; i++)
sum |= x[i] ^ y[i];
sum |= (sum >> 4);
sum |= (sum >> 2);
sum |= (sum >> 1);
return (sum ^ 1) & 1;
}
void fprime_select(uint8_t *dst,
const uint8_t *zero, const uint8_t *one,
uint8_t condition)
{
const uint8_t mask = -condition;
int i;
for (i = 0; i < FPRIME_SIZE; i++)
dst[i] = zero[i] ^ (mask & (one[i] ^ zero[i]));
}
void fprime_add(uint8_t *r, const uint8_t *a, const uint8_t *modulus)
{
raw_add(r, a);
raw_try_sub(r, modulus);
}
void fprime_sub(uint8_t *r, const uint8_t *a, const uint8_t *modulus)
{
raw_add(r, modulus);
raw_try_sub(r, a);
raw_try_sub(r, modulus);
}
void fprime_mul(uint8_t *r, const uint8_t *a, const uint8_t *b,
const uint8_t *modulus)
{
int i;
memset(r, 0, FPRIME_SIZE);
for (i = prime_msb(modulus); i >= 0; i--) {
const uint8_t bit = (b[i >> 3] >> (i & 7)) & 1;
uint8_t plusa[FPRIME_SIZE];
shift_n_bits(r, 1);
raw_try_sub(r, modulus);
fprime_copy(plusa, r);
fprime_add(plusa, a, modulus);
fprime_select(r, r, plusa, bit);
}
}
void fprime_inv(uint8_t *r, const uint8_t *a, const uint8_t *modulus)
{
uint8_t pm2[FPRIME_SIZE];
uint16_t c = 2;
int i;
/* Compute (p-2) */
fprime_copy(pm2, modulus);
for (i = 0; i < FPRIME_SIZE; i++) {
c = modulus[i] - c;
pm2[i] = c;
c >>= 8;
}
/* Binary exponentiation */
fprime_load(r, 1);
for (i = prime_msb(modulus); i >= 0; i--) {
uint8_t r2[FPRIME_SIZE];
fprime_mul(r2, r, r, modulus);
if ((pm2[i >> 3] >> (i & 7)) & 1)
fprime_mul(r, r2, a, modulus);
else
fprime_copy(r, r2);
}
}

Some files were not shown because too many files have changed in this diff Show more