What is the fastest possible transatlantic flight today?

I’m working on a novel that, as part of the backstory, requires samples of a biological agent to be flown from Incirlik Air Base to labs across Europe, and then finally from the UK to the US.

In the context of the story, time is critical. Using any currently available aircraft (large or small, civil or military, US or allied etc.), what is the shortest flight time that could realistically be achieved?

Fastest possible way of finding a sublist which meets rules [on hold]

Accepted languages: any

n and m long lists of numbers in a fixed order, find a length of the longest possible pair of sublist which meets rules specified below:

  • Either sublist must have the same length
  • One sublist must come from one list
  • Either sublist must be a contiguous fragment of the list
  • Sum of all values in one sublist must be of the same parity as the second sublist

Input:

n – number of values in the first list.

m – number of values in the second list.

{there is standard input for n values then for m values}

Output:
length of the longest possible pair of sublists which meets rules

Example:

1
6 4
0 1 2 3 4 5
3 1 3 6

Answer:

3

Why? The longest possible sublists are made of 3 values – 2,3,4 form the first list and 3,1,3 from the second list. They come from contiguous sublist of lists and their parity is also the same: (2+3+4) mod 2 == 1 ; (3+1+3) mod 2 == 1.

Your answer is considered valid only if your program will output 49999 for this test in less than 0.5 sec on a 3GHz core.
Most efficient answer wins if any test performed on it will always return true output. Here are some tests to determine if your answer is correct.
Every 0.001 sec below the 0.5 sec is 1 point.
Points are gained according to the validation test.

If you have any questions or you need something to clarify let me know, then I’ll edit this post.

It’s my code the fastest way to retreive data from one table when the criteria are > 1?

I want to create one custom table, like a pivot table, where the user can find immediatly the total items, and if He click on data I show him the db page correctly filtered.
My code run right, but continuous improvement pushes me to look for a better performing code.
Thanks for every contributes.

Sub AddTab1(ByVal c As Integer, str As String)
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim dbSh As Worksheet, tabSh As Worksheet
Dim ini As Date, fin As Date, tmp As Date, s As Range
Set dbSh = Sheets("db_Out")
Set tabSh = Sheets("Tab")
Dim arrTab(), rng As Range, i As Integer, cl As Range
Dim colIndex As Long, lrw As Integer, lcl As Integer
Dim firstCell As Range
Dim lastCell As Range
ini = Now()
If dbSh.Cells(2, c) = vbNullString Then MsgBox "Non ci sono dati valorizzati da estrapolare", vbInformation, "Cf_utility.info": Exit Sub
tabSh.Select

With tabSh
Set s = Range(str)
    s.Select
    If s.Offset(1) = vbNullString Then GoTo continue
    s.Select
    lrw = Columns(s.Column).Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).row 'Selection.End(xlDown).row
    lcl = Selection.End(xlToRight).Column
    s.Offset(1).Select
    .Range(Selection, Cells(lrw, lcl)).ClearContents
    s.Offset(2).Select
    .Range(Selection, Cells(lrw, lcl)).Select
    Selection.Delete Shift:=xlUp
    s.Offset(1).Select
End With

continue:
With dbSh
    .AutoFilterMode = False
    .Cells.EntireColumn.Hidden = False
    Set firstCell = .Cells(2, c)
    Set lastCell = .Cells(.Rows.Count, c).End(xlUp)
    Set rng = .Range(firstCell, lastCell)
    rng.Copy
End With
    tabSh.Select
    s.Offset(1).Select
    ActiveSheet.Paste
    Application.CutCopyMode = False
    tabSh.Sort.SortFields.Clear
    tabSh.Sort.SortFields.Add key:=s, _
        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:= _
        xlSortTextAsNumbers
With tabSh.Sort
    .SetRange Range(s.Offset(1), Cells(Columns(s.Column).Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).row, s.Column))
    .Header = xlYes
    .MatchCase = False
    .Orientation = xlTopToBottom
    .SortMethod = xlPinYin
    .Apply
End With
s.Select
s.Offset(1).Select
Set rng = Range(Selection, Cells(Columns(s.Column).Find(What:="*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).row, s.Column))
rng.RemoveDuplicates Columns:=1, Header:=xlNo

'KPI2-1 (Prelievo)
s.Select
lrw = Selection.End(xlDown).row
lcl = Selection.End(xlToRight).Column
ReDim arrTab(4 To lrw, 1 To lcl - 1)
s.Offset(1).Select
Set rng = Range(Selection, Selection.End(xlDown))
'c = D_KPI2_1        'Kpi KPI2_1
For Each cl In rng.Cells
    arrTab(cl.row, 2) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "STD", dbSh.Columns(V_KPI2_1), 0.9) + WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "STD", dbSh.Columns(V_KPI2_1), 1)
    If Not arrTab(cl.row, 2) > 0 Then arrTab(cl.row, 2) = Empty
    arrTab(cl.row, 3) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "STD", dbSh.Columns(V_KPI2_1), "Out of KPI")
    If Not arrTab(cl.row, 3) > 0 Then arrTab(cl.row, 3) = Empty
    arrTab(cl.row, 4) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "STD", dbSh.Columns(V_KPI2_1), "Backlog")
    If Not arrTab(cl.row, 4) > 0 Then arrTab(cl.row, 4) = Empty
    arrTab(cl.row, 5) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "PRIORITY", dbSh.Columns(V_KPI2_1), 0.95) + WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "PRIORITY", dbSh.Columns(V_KPI2_1), 1)
    If Not arrTab(cl.row, 5) > 0 Then arrTab(cl.row, 5) = Empty
    arrTab(cl.row, 6) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "PRIORITY", dbSh.Columns(V_KPI2_1), "Out of KPI")
    If Not arrTab(cl.row, 6) > 0 Then arrTab(cl.row, 6) = Empty
    arrTab(cl.row, 7) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "PRIORITY", dbSh.Columns(V_KPI2_1), "Backlog")
    If Not arrTab(cl.row, 7) > 0 Then arrTab(cl.row, 7) = Empty
    arrTab(cl.row, 8) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "AOG", dbSh.Columns(V_KPI2_1), 1)
    If Not arrTab(cl.row, 8) > 0 Then arrTab(cl.row, 8) = Empty
    arrTab(cl.row, 9) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "AOG", dbSh.Columns(V_KPI2_1), "Out of KPI")
    If Not arrTab(cl.row, 9) > 0 Then arrTab(cl.row, 9) = Empty
    arrTab(cl.row, 10) = WorksheetFunction.CountIfs(dbSh.Columns(c), cl.Value, dbSh.Columns(TypeTra), "AOG", dbSh.Columns(V_KPI2_1), "Backlog")
    If Not arrTab(cl.row, 10) > 0 Then arrTab(cl.row, 10) = Empty
    For i = 2 To 10
        arrTab(cl.row, 1) = arrTab(cl.row, 1) + arrTab(cl.row, i)
    Next
    If arrTab(cl.row, 1) < 1 Then arrTab(cl.row, 1) = Empty
Next
Range(s.Offset(1, 1), Cells(lrw, s.Offset(, 10).Column)) = arrTab()

s.Select
StartCl
lcl = Selection.End(xlToRight).Column
lrw = Selection.End(xlDown).row
Range(Selection.Offset(1), Selection.Offset(1, 11)).Select
Selection.Copy
Range(Selection, Selection.End(xlDown)).Select
Selection.PasteSpecial Paste:=xlPasteFormats, Operation:=xlNone, _
    SkipBlanks:=False, Transpose:=False
Application.CutCopyMode = False
s.Select
CleanTab
s.Select
InsLink

fin = Now()
tmp = fin - ini
Debug.Print tmp
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.StatusBar = False


End Sub

Can be that my english is not very clear, if y need will y see the picture.

fabrizio

enter image description here

What is the fastest algorithm for calculating nth term of Fibonacci sequence?

If we exclude methods that include precalculating of all Fibonacci numbers up to a sufficiently large number of n what would be the fastest algorithm for calculating nth term of Fibonacci sequence ?

I guess that Iterative and Matrix algorithms should be faster than Analytic and Recursive algorithms , ( see this paper ) .

On this page I have found five different algorithms . According to author the fastest is a Matrix algorithm that uses O(log n) arithmetic operations .

Fastest possible way of finding a sublist which meets rules

Accepted languages: any

n and m long lists of numbers in a fixed order, find a length of the longest possible pair of sublist which meets rules specified below:

  • Either sublist must have the same length
  • One sublist must come from one list
  • Either sublist must be a contiguous fragment of the list
  • Sum of all values in one sublist must be of the same parity as the second sublist

Input:

n – number of values in the first list.

m – number of values in the second list.

{there is standard input for n values then for m values}

Output:
length of the longest possible pair of sublists which meets rules

Example:

1
6 4
0 1 2 3 4 5
3 1 3 6

Answer:

3

Why? The longest possible sublists are made of 3 values – 2,3,4 form the first list and 3,1,3 from the second list. They come from contiguous sublist of lists and their parity is also the same: (2+3+4) mod 2 == 1 ; (3+1+3) mod 2 == 1.

Your answer is considered valid only if your program will output 49999 for this test in less than 0.5 sec.
Most efficient answer wins if any test performed on it will always return true output. Here are some tests to determine if your answer is correct.
Every 0.001 sec below the 0.5 sec is 1 point.
Points are gained according to the validation test.

If you have any questions or you need something to clarify let me know, then I’ll edit this post.

Fastest possible way of finding a subset which meets rules [on hold]

Accepted languages: any

Given q pairs of n and m long sets of numbers in a fixed order, find a length of the longest possible pair of subsets for each pair which meets rules specified below:

  • Either subset must have the same length
  • One subset must come from one set
  • Either subset must come from a coherent fragment of a set
  • Sum of all values in one subset must be of the same parity as the second subset

Input:

q – number of pairs of sets to find the longest subset which meets the rules. q is a integer larger or equal one and smaller or equal 20000

n – number of values in the first set. Values are larger or equal 0 and smaller or equal pow(10,9)

m – number of values in the second set. Values are larger or equal 0 and smaller or equal pow(10,9)

{there is standard input for n values then for m values}

Output:
length of the longest possible pair of subsets for each pair which meets rules

Example:

1
6 4
0 1 2 3 4 5
3 1 3 6

Answer:

3

Why? The longest possible subsets are made of 3 values – 2,3,4 form the first set and 3,1,3 from the second set. They come from coherent fragment of sets and their parity is also the same: (2+3+4) mod 2 == 1 ; (3+1+3) mod 2 == 1.

Your answer is considered valid only if your program will output 49999 for this test in less than 0.5 sec.
Most efficient answer wins if any test performed on it will always return true output. Here are some tests to determine if your answer is correct.
Every 0.001 sec below the 0.5 sec is 1 point.

If you have any questions or you need something to clarify let me know, then I’ll edit this post.

Fastest possible way of finding a substring which meets rules [on hold]

Accepted language: c++

Restriction is set due to that it is the only I use in my project.

Given q pairs of n and m long sets of numbers in a fixed order, find a length of the longest possible pair of subsets for each pair which meets rules specified below:

  • Either subset must have the same length
  • One subset must come from one set
  • Either subset must come from a coherent fragment of a set
  • Sum of all values in one subset must be of the same parity as the second subset

Input:

q – number of pairs of sets to find the longest subset which meets the rules. q is a integer larger or equal one and smaller or equal 20000

n – number of values in the first set. Values are larger or equal 0 and smaller or equal pow(10,9)

m – number of values in the second set. Values are larger or equal 0 and smaller or equal pow(10,9)

{there is standard input for n values then for m values}

Output:
length of the longest possible pair of subsets for each pair which meets rules

Example:

1
6 4
0 1 2 3 4 5
3 1 3 6

Answer:

3

Why? The longest possible subsets are made of 3 values – 2,3,4 form the first set and 3,1,3 from the second set. They come from coherent fragment of sets and their parity is also the same: (2+3+4) mod 2 == 1 ; (3+1+3) mod 2 == 1.

Your answer is considered valid only if your program will output 49999 for this test in less than 0.5 sec.
You win if you’re the first person who post an program giving always a true output for every possible test for this program and if your “answer is considered valid”.

Thanks for your help. If you have any questions or you need something to clarify let me know, then I’ll edit this post.

Fastest way to move files from a guest VM to the host?

I’m looking for the fastest way to copy files from a VM to physical servers.

Setting up a network between them isn’t a thing I’d like to do. I believe it is much more secure when not having one.

VMware suggests using the Copy-VMGuestFile cmdlet from their PowerCLI interface, however I find it slow (Running at approximately 1.5MB/s).

I thought of the following:

  • Creating a new virtual hard drive, moving the files in, and download the .vmdk file from the server, then extracting it locally. It is possible, however will not work with working VMs, and I don’t want to shut-down the VM every time I want to move files.
  • Use the virtual floppy device and download the .flp file. It works even if the VM is running, but it is limited to 2.8MB.

Do I have any other way?

I’m using ESXi 4.1.

Thanks.

What is the most compact data structure for canonical k-mers with the fastest lookup time?

edit: Results are current as of Nov 7, 2018 12:00 PST.

Background

K-mers have many uses in bioinformatics, and for this reason it would be useful to know the most RAM-efficient and fastest way to work with them programmatically. There have been questions covering what canonical k-mers are, how much RAM k-mer storage theoretically takes, but we have not yet looked at the best data structure to store and access k-mers and associated values with.

Question

What data structure in C++ simultaneously allows the most compact k-mer storage, a property, and the fastest lookup time? For this question I choose C++ for speed, ease-of-implementation, and access to lower-level language features if desired. Answers in other languages are acceptable, too.

Setup

  • For benchmarking:
    • I propose to use a standard fasta file for everyone to use. This program, generate-fasta.cpp, generates two million sequences ranging in length between 29 and 300, with a peak of sequences around length 60.
    • Let’s use k=29 for the analysis, but ignore implementations that require knowledge of the k-mer size before implementation. Doing so will make the resulting data structure more amenable to downstream users who may need other sizes k.
    • Let’s just store the most recent read that the k-mer appeared in as the property to retrieve during k-mer lookup. In most applications it is important to attach some value to each k-mer such as a taxon, its count in a dataset, et cetera.
    • If possible, use the string parser in the code below for consistency between answers.
    • The algorithm should use canonical k-mers. That is, a k-mer and its reverse complement are considered to be the same k-mer.

Here is generate-fasta.cpp. I used the command g++ generate_fasta.cpp -o generate_fasta to compile and the command ./generate_fasta > my.fasta to run it:

return 0;
//generate a fasta file to count k-mers
#include 
#include 

char gen_base(int q){
  if (q <= 30){
    return 'A';
  } else if ((q > 30) && (q <=60) ){
    return 'T';
  } else if ((q > 60) && (q <=80) ){
    return 'C';
  } else if (q > 80){
    return 'G';
  }
  return 'N';
}

int main() {
  unsigned seed = 1;
  std::default_random_engine generator (seed);
  std::poisson_distribution poisson (59);
  std::geometric_distribution geo (0.05);
  std::uniform_int_distribution uniform (1,100);
  int printval;
  int i=0;
  while(i<2000000){
    if (i % 2 == 0){
      printval = poisson(generator);
    } else {
      printval = geo(generator) + 29;
    }
    if (printval >= 29){
      std::cout << '>' << i << 'n';
      //std::cout << printval << 'n';
      for (int j = 0; j < printval; j++){
        std::cout << gen_base(uniform(generator));
      }
      std::cout << 'n';
      i++;
    }
  }
  return 0;
}

Example

One naive implementation is to add both the observed k-mer and its reverse complement as separate k-mers. This is obviously not space efficient but should have fast lookup. This file is called make_struct_lookup.cpp. I used the following command to compile on my Apple laptop (OS X): clang++ -std=c++11 -stdlib=libc++ -Wno-c++98-compat make_struct_lookup.cpp -o msl.

#include 
#include 
#include 
#include 
#include 
//build the structure. measure how much RAM it consumes.
//then measure how long it takes to lookup in the data structure

#define k 29

std::string rc(std::string seq){
  std::string rc;
  for (int i = seq.length()-1; i>=0; i--){
    if (seq[i] == 'A'){
      rc.push_back('T');
    } else if (seq[i] == 'C'){
      rc.push_back('G');
    } else if (seq[i] == 'G'){
      rc.push_back('C');
    } else if (seq[i] == 'T'){
      rc.push_back('A');
    }
  }
  return rc;
}

int main(int argc, char* argv[]){
  using namespace std::chrono;
  //initialize the data structure
  std::string thisline;
  std::map kmer_map;
  std::string header;
  std::string seq;
  //open the fasta file
  std::ifstream inFile;
  inFile.open(argv[1]);

  //construct the kmer-lookup structure
  int i = 0;
  high_resolution_clock::time_point t1 = high_resolution_clock::now();
  while (getline(inFile,thisline)){
    if (thisline[0] == '>'){
      header = thisline.substr(1,thisline.size());
      //std::cout << header << 'n';
    } else {
      seq = thisline;
      //now add the kmers
      for (int j=0; j< thisline.size() - k + 1; j++){
        kmer_map[seq.substr(j,j+k)] = stoi(header);
        kmer_map[rc(seq.substr(j,j+k))] = stoi(header);
      }
      i++;
    }
  }
  std::cout << "  -finished " << i << " seqs.n";
  inFile.close();
  high_resolution_clock::time_point t2 = high_resolution_clock::now();
  duration time_span = duration_cast>(t2 - t1);
  std::cout << time_span.count() << " seconds to load the array." << 'n';

  //now lookup the kmers
  inFile.open(argv[1]);
  t1 = high_resolution_clock::now();
  int lookup;
  while (getline(inFile,thisline)){
    if (thisline[0] != '>'){
      seq = thisline;
      //now lookup the kmers
      for (int j=0; j< thisline.size() - k + 1; j++){
        lookup = kmer_map[seq.substr(j,j+k)];
      }
    }
  }
  std::cout << "  - looked at " << i << " seqs.n";
  inFile.close();
  t2 = high_resolution_clock::now();
  time_span = duration_cast>(t2 - t1);
  std::cout << time_span.count() << " seconds to lookup the kmers." << 'n';

}

Example output

I ran the above program with the following command to log peak RAM usage. The amount of time the lookup of all k-mers in two million sequences is reported by the program. /usr/bin/time -l ./msl my.fasta.

The output was:

 -finished 2000000 seqs.
562.864 seconds to load the array.
  - looked at 2000000 seqs.
368.734 seconds to lookup the k-mers.
     1046.94 real       942.38 user        78.96 sys
11680514048  maximum resident set size

So, the program used 11680514048 bytes = 11.68GB of RAM and it took 368.734 seconds to lookup the k-mers in two million fasta files.

Results

Below is a plot of the results from each user's answers.

enter image description here